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Preface 


We are happy to welcome you at Humboldt-Universitat zu Berlin hosting this event under the conference 
theme “Breaking Down Walls: Culture, Context, Computing”. This years’ iConference is the first 
iConference taking place in Europe. 

Humboldt-Universitat zu Berlin has been an Open Access advocate for many years and we are 
pleased to continue the tradition of publishing the conferences’ proceedings in Open Access. The compilation 
is deposited in the Illinois Digital Environment for Access to Learning and Scholarship (IDEALS; 
https: //www.ideals.illinois.edu/handle/2142/45869). This iConference proceedings volume follows the 
proceedings 2013 to be published under the name of the iSchools as its publisher. 

All proceedings contributions (Papers, Notes, Posters, Workshop Descriptions, Social Interaction 
and Engagement Descriptions) are deposited by offering pdf/a format for the purpose of document citation 
and long term preservation as well as epub and mobipocket format for mobile devices. For the processing 
of epub and mobipocket format we used the Open Source Software Calibre. Each contribution was assigned 
a resolvable Digital Object Identifier (DOI). 

The process of this compilation is documented for the purpose of exchanging ideas and re-use of 
experiences. The documentation will be available online shortly after the conference via urn:nbn:de:kobv:11- 
100215746. 

iConference 2014 promoted the provision of research data for re-use and transparency in research. 
Our aim was to give all participants the possibility to publish and link research data upon which their 
proceedings publications are based on. In the end we linked just a single research data set. This fact 
underlines that the establishment of data sharing and archiving culture is just in the early stages of 
development in LIS — or according to the conference theme “Breaking Down Walls” from small beginnings 
come great things. This years’ conference gives us the chance to bear this in mind when we discuss topics 
such as research data management in the context of Library and Information Science research and teaching. 


Maxi Kindling (Proceedings Chair) 
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Pushback: The Growth of Expressions of Resistance to Constant Online Connectivity 
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1 University of Washington iSchool 


Abstract 

As a result of the increasing connectivity provided by smartphones, wireless Internet availability, and 
portable devices such as laptops and tablets, technology users can and often are continuously connected 
to the Internet and its communication services. However, many technology users who first embraced 
constant connectivity are now pushing back, looking for ways to resist the constant call to be permanently 
connected. This pushback behavior is starting to appear in the popular press, in personal blogs, and in a 
small number of academic studies. “Pushback” is a growing phenomenon among frequent technology 
users seeking to establish boundaries, resist information overload, and establish greater personal life 
balance. This study examines a growing body of both academic and non-academic literature in which we 
identified five primary motivations and five primary behaviors related to pushback by communication 
technology users. Primary pushback motivations include emotional dissatisfaction, external values, taking 
control, addiction, and privacy. Primary pushback behaviors are behavior adaptation, social agreement, 
no problem, tech control, and back to the woods. The implications of these motivations and behaviors 


surrounding pushback to communication technology are discussed. 
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1 Introduction 


In 2011 the New Yorker magazine published a controversial column, “The Information: How the Internet 
Gets Inside Us” as part of The Critic at Large section (Gopnik, 2011). The author discussed how works on 
the cultural transformations in the information age tend to fall into one of three categories: the Never- 
Betters, who euphorically exalt the contributions of technology to improve our lives; the Ever-Wasers, who 
claim nothing has really changed and insist innovation is really nothing new; and the Better-Nevers, who 
bemoan the ways in which technology negatively impacts our daily lives and espouse nostalgia for the good 
old days before the Internet. However, in the almost three years since that publication, the technology user 
landscape has already changed. A new category of expressions is now clearly palpable in the media: a 
“Better-Less” group of discontents who used to be euphoric embracers of the opportunities of technological 
connectivity, but who are now looking for ways to push back and resist, to manage or reduce their use and 
perceived dependence on technology. Formerly embracing the changes that the information age has wrought, 
these capable comfortable users of technology are now expressing doubt, and looking for ways out. 

A backlash to the exuberant reception that accompanied the introduction of recent technology 
innovations, from smartphones and tablets to Facebook, Twitter and other social media tools, may be 
inevitable. This paper reviews a growing body of literature, both academic and non-academic, about 
expressions of resistance and saturation with communication technologies and overload of information and 
relationships that they entail. UW researcher Kirsten Foot analyzed the emergence during 2008-2010 of 
discourses of pushback in multiple sociopolitical realms. She notes that “recent studies in this vein have 
focused on identity and class performance aspects of social “media refusal” (Portwood-Stacer, 2013) and 
“internet resistance” (Woodstock, 2011), but conceptualizes pushback more broadly, to include discourses 


iConference 2014 Stacey Morrison & Ricardo Gomez 


about reducing or avoiding media use, altering media practices, and efforts to influence media policies” 
(Foot, in review). Convergent with Foot’s approach, we define pushback to connectivity as a reaction 
against the overload of information and changing relationships brought about by communication 
technologies such as smart phones, tablets and computers connected to the Internet. Overloaded users are 
pushing back against permanent connectivity, in an attempt to manage, limit or control their exposure and 
the saturation caused by ubiquitous and constantly connected communication technologies. 

Pushback is a relatively recent phenomenon; it has only recently started to appear in academic 
research sources, although it is more common in personal websites, blogs, magazines and newspapers from 
the last few of years. We review these different types of sources, and offer a typology of motivations and 
behaviors for pushback. We identified five different types of motivations for pushback, as well as five 
different types of pushback behaviors. However, all forms of pushback have a common denominator of 
dissatisfaction or disillusionment with one or more types of technology and/or social media, and the users’ 
desire to pull away from technology usage in some way. A closer examination of the pushback phenomena 
can offer a better understanding of technology user behavior and lend insight into how people connect with 
each other, with or without communication technologies. Our typologies can be used to inform future 
empirical studies about pushback and resistance to connectivity. 

From the standpoint of Human-Computer Interaction (HCI), this response raises questions about 
technology design and how to better serve users. From an economic standpoint, pushback calls into question 
how long each new technology innovation can last as a viable profitable enterprise, and whether business 
models need to account for these motivations and subsequent behaviors that manifest as pushback. From 
a psychological perspective, pushback sheds light on the deeper emotional needs and desires that people 
seek to fulfill through technology. From a humanist and philosophical position, it suggests that the Internet, 
accessed in so many ways, is not an easy answer to the human desire for connection with others. But in the 
end, this desire for connection is what frequently drives people to remain tethered to their devices, despite 
the feelings of dissatisfaction with technology. 

The remainder of this paper presents the methods employed in the study, followed by a description 
of some of the salient findings regarding pushback to connectivity. We then discuss these findings and 
suggest a typology of motivations and of behaviors that emerged from a review of the literature, and we 
conclude with some of the implications and possible areas for future research uncovered by this exploratory 
study. 


2 Methods: a literature review on Pushback 


After a systematic review, we compiled 73 sources, with roughly a third of them coming from personal blogs 
and websites, a third from popular media sources, and a third from academic conferences and journals. In 
an iterative process of clustering and coding, we identified two distinct themes: motivations that drive users 
to push back, and pushback behaviors, the things people do when pushing back. All sources were then 
coded along these two themes, which resulted in the emergence of five types of motivations, and five types 
of behaviors. 

For each source, we identified the primary motivation and behavior discussed or exhibited by the 
user/users as a means of establishing the most pervasive expression of pushback. Some sources discussed 
both motivations and behaviors, and many discussed two or more motivations and/or behaviors, which 
means the typologies are not mutually exclusive. This was especially true of the personal testimony of 
bloggers, who may feel a need to defend their pushback choice with multiple reasons, anticipating 
judgmental or questioning responses from their readership. In these instances, the primary motivation was 
often the first one discussed by the blogger. Secondary motivations followed. In research studies, the primary 
motivations were often less distinct, and in some cases, this was a result of the focus of the research itself. 
Nevertheless, we centered on the most salient or conclusive results determined by the research studies. 
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We then returned to each source and established secondary motivations and behaviors, if relevant. Users 
often express multiple reasons (motivations) and methods (behaviors) of withdrawing or filtering their 
technology use. We compiled the data arriving at two sets of measurements: one for primary motivation 
and behavior and a second set of data measuring the frequency of all (primary or secondary) user 
motivations and behaviors as they appear overall in the coding. An assessment of both primary and 
secondary motivations and behaviors offers an overall picture that is, in some cases, different than when it 
is based only on primary drivers. We include this information as part of our data in the “overall” category 
in each case. 


3 Findings: Pushback in Blogs, Popular Press and Academic Research 


Personal web pages and blogs are the most common source to find expressions of pushback to connectivity. 
Ironically, people discontent with aspects of technology use technology to complain about it, though some 
bloggers, in particular, seem to be very aware of this irony. They address their audience as peers, discussing 
their experiences in a reflective way, confessing their fears and confusion to those who they presume might 
share the same concerns. For example, in the March 2012 entry “I Got Rid of My Smartphone” on his blog 
The Rich Life, young engineer Casey Friday writes: 


A lot of people have asked, ‘Why don’t you just use it less?’ I think that’s sort of like asking a 
crack addict, ‘Why don’t you just put the crack in the closet and do less blow?’ I don’t even want 
the option of using a smartphone, because if I have one, I will check it obsessively. It’s a simple 
fact. (Friday, 2012, para. 13). 


Personal accounts of disenchantment with technology fall short of a movement, but they represent a 
grassroots groundswell of activity. Sometimes, they are picked up by the press. Media coverage of changes 
in social media user behavior highlights studies, surveys and polls, denoting what we call the pushback 
movement as more than a collection of isolated anecdotes. In “The anti-social network: Life without 
Facebook” (2012), CNN.com reported: 


With a website that boasts 901 million active users and is launching an IPO on Friday, it seems 
unlikely that once you get on Facebook, you'd ever leave. But deactivating from the social 
networking site is not that unusual. Close to half of Americans think Facebook is a passing fad, 
according to the results of a new Associated Press-CNBC poll. More and more people are stepping 
away from the technological realm and de-teching (para. 4). 


Two recent books, Alone Together by M.I.T.’s Sherry Turkle (2012) and the Pulitzer Prize finalist The 
Shallows by Nicholas Carr (2011), ask broad ethical questions about how our interaction with the Internet 
and technology is profoundly shaping our lives, even changing our brains, affecting both the depth of our 
relationships and the depth of our thinking. References to both works appear frequently in many sources as 
inspirational work to explore or engage in pushback to connectivity. A recent literature review “Discerning 
Rejection of Technology” by Murthy and Mani (2013) reports that technological complexity, technology 
fatigue, switching cost or loss aversion were among the most consistent reasons for user rejection of 
technology. An assessment of academic research published in peer reviewed conferences and journals reveals 
three different types of approaches in studies of pushback to connectivity, from the perspectives of 
information and communication, of psychology, and of youth studies. 

As an example of information and communication approaches, in 2010 Jennifer Rauch, an Associate 
Professor of Journalism and Communication Studies at Long Island University in New York, explained the 
history of the “slow media” movement in the online journal Transformations. Pushback can be seen as a 
piece of this larger movement that began as an offshoot of a larger central philosophy. In the article, Rauch 
provides a broad historical framework for seeing the rise of technology resistance. She writes: 
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Since the turn of the 21st century, people from diverse walks of life have begun to form a sub- 
cultural movement whose members reduce their overall time spent with media and/or their use of 
specific communication technologies in order to constrain the influence of digital devices and 
networks on their personal, professional, and family lives (Rauch, 2010, para. 1). 


Examples of clinical psychology included the idea of “unplugging” as it first became popular in 2010 
(Rowan). In January 2011, American Psychological Association sanctioned a series of four research studies 
which are discussed in the paper, “A Two-Process View of Facebook Use and Relatedness Need-Satisfaction: 
Disconnection Drives Use, and Connection Rewards It”. The researchers conclude that: 


Overall, Facebook use appears to be a positive phenomenon, although perhaps not as positive as 
face-to-face sociality. However, Facebook may also offer an overly tempting coping device for the 
lonely, one that feels good but does not actually address underlying feelings of social disconnection 
in life (Sheldon et al, pp. 773-774). 


By 2012, other scholars had started looking at pushback. Foot (2012) explored pushback behaviors in the 
political/military, organization/work, and personal/relational realms, and suggested the latter are generally 
motivated by a desire for freedom from being always on, deeper connection in relationships, creating space 
for kids to be kids, higher attention to signals/noise ratio, and dealing with privacy concerns; some of these 
motivations were corroborated in our study, as we will see below. Other scholarly work more deeply 
examined the experiences of younger technology users as well. Previous research had suggested that younger 
users, “digital natives”, people born into the age of everyday technology usage, fared much better in terms 
of adopting technology, responding positively to it, and managing technology better than their parents, the 
“digital immigrants”, those not raised in a technology-heavy environment (Prensky, 2001, pp.1-6). Not 
surprisingly, the Kaiser Family Foundation (2010) published the findings of one of the largest U.S. research 
studies of children 8-18 and their relationships with a variety of media outlets, finding a sharp increase in 
all media usage. 


4 Analysis: Pushback Motivations and Behaviors 


After analyzing the different source materials on pushback to connectivity, including blogs, popular press 
and academic sources, a typology emerged with five types of motivations, and five types of behaviors. Each 
one is described in more detail below. 


4.1 Five Motivations for Pushback 


We were surprised to find five remarkably consistent types of motivations that lead people to push back 
and resist connectivity, according to the literature we examined. While our preliminary reviews had led us 
to expect that users might indicate a desire to push back against technology as a result of frustration with 
the operation or repeated learning of new technology, fatigue resulting from this learning, or as a reaction 
to technology upgrading cost, this was not what we found in the literature. Instead, we found that the 
motivations for pushback and resistance that appear in the literature were deeply grounded in emotions, as 
we will see in the five types of motivations that are described below. 

One exception to this trend is a recent literature review “Discerning Rejection of Technology” by 
Murthy and Mani (2013). Their study relies heavily on older academic research and technology trade 
publications, mostly based on literature published before 2010 and with many references to literature pre- 
2000. In that study, the authors argue that technological complexity, technology fatigue, switching cost or 
loss aversion were among the most consistent reasons for user rejection of technology (Murthy & Mani, 
2013). Our findings do not corroborate these claims. Instead, we found that the “cost” that users today are 
most concerned with is the emotional cost of technology. Even in regard to privacy, which is undeniably a 
legal and civil rights issue for users, the greater user concern about privacy was typically rooted in either 
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fear of embarrassment or frustration with an inability to control an online identity, more than it was a 
matter of a fear of piracy, theft or disclosure of legal or financial matters. 

Below are brief descriptions of the motivations for pushing back against technology and the 
technology user behaviors that we found emerging from the literature. These are followed by a chart with 
their relative frequencies, both as a primary characteristic (exclusive) and as an overall characteristic (non- 


exclusive). 


4.1.1 Emotional dissatisfaction: 
Users pushing back because their needs are not being met 


Emotional dissatisfaction is often accompanied by disappointment, a result of having had high expectations 
regarding the technology that were not satisfied. Emotional dissatisfaction can involve bitterness or even 
anger, as users had adopted a form of technology use with hopeful expectations only to be disillusioned. 
Some research suggests that this is as much a result of the personality of the user as it is an issue with the 
technology (i.e., Moore & McElroy, 2011; Krasnova et al, 2013). An example of clear emotional 
dissatisfaction is expressed in a blog: 


For me, Facebook wasn't even a tool that fosters maintaining real relationships with old friends 
(and I mean real life friends). For me, it somewhat detracted from the genuine catching up that 
happened when I actually ran into someone from my past. I love the mystery of running into people, 
and learning about where they've been directly from them, rather than from a secondary feed of 
snippets and status updates from their manually-curated Facebook profiles. (Anonymous Associate 
Project Manager at Google, n.d., para. 5). 


In another example of the growing unease and dissatisfaction about communication technology, Susan 
Conley writes as part of “In Smartphone Addiction: Why I’m Putting the Phone Down: 


So for months I've been feeling stuck -- I've got this snazzy Smartphone, and I should probably use 
it. And I've also been feeling a little worried -- what is this phone doing to my brain anyway? Why 
do I have this email compulsion? .. And I'd been feeling scattered. I'd been feeling like all my 
thoughts were light..maybe it's not the Smartphone's fault, but [Nicholas] Carr says that because 
of these phones, all of us ‘stop having opportunities to be alone with our thoughts, something that 
used to come naturally.’ I knew I was going to have to throw my Smartphone away too. (Conley, 
2012, para.5-7). 


4.1.2 External values: 
Pushing back due to political, religious or moral reasons 


These people often cite a desire to reconnect with family or adhere to political religious beliefs that encourage 
selfless behavior and face-to-face interaction with others. Some people cite concern with the politics of the 
internet, fearful that marketing, consumerism and distraction are enveloping the user. For example: 


‘Everyone now wants to know how to remove themselves from social networks. It has become 
absolutely clear that our relationships to others are mere points in the aggregation of marketing 
data. Political campaigns, the sale of commodities, the promotion of entertainment — this is the 
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outcome of our expression of likes and affinities’. These are the opening words for the Facebook 
Suicide Bomb Manifesto written by Sean Dockray and first published in the iDC mailing list May 
28, 2010. (Karppi, 2011, para. 1). 


4.1.3. Taking back control: 
Users pushing back to regain control of their time and energy 


The concern is primarily about time management and feeling that some technology use, often a specific 
type of technology, like social media or web surfing, is “stealing” productive time from the user. This is a 
very frequent secondary motivation (not always the primary one) among technology users. In the web 
article “LabRat: What Happens When You Unplug from Your Internet Addiction?” Brittany Ancell writes, 
“While I was constantly searching for ways to become more efficient at work, I was idling away my free 
time with trivial eBay pursuits and constant email monitoring” (Abcell, n.d., para. 2). 


4.1.4 Addiction: 
Pushing back as a result of technology addiction 


Variations on the term “addiction” are frequent in user testimony. This fear is expressed in both young and 
old, arguably more often in younger people. “‘I clearly am addicted and the dependency is sickening,’ said 
one student in the study. ‘I feel like most people these days are in a similar situation, for between having a 
Blackberry, a laptop, a television, and an iPod, people have become unable to shed their media skin’” 
(ICMPA, 2010, para. 1). 


4.1.5 Privacy: 
Users pushing back due to fear about their privacy being violated 


Most of all, these technology users fear that they are being monitored and/or their online identities are in 
jeopardy. In “Why I Left Facebook and Where You Can Find Me Online”, blogger Michael W. Dean writes, 


Facebook is starting to act like The State. Instagram, which is owned by Facebook, has updated 
their “user agreement” to say that they can sell any of your photos and not pay you. And they can 
use photos of your face. They could sell a photo of you smiling with a gun to an anti-gun campaign. 
If youw’re overweight, you could end up in the “before” photo for a weight loss pill. etc.....Facebook 
is spying on you. Of course these days, you are being spied on everywhere, all the time, by 
governments and corporations, but Facebook is the worst of the worst. And their privacy settings 
are useless (2012, para. 4). 


It is interesting to note that while emotional dissatisfaction is the most frequently reported reason to push 
back and resist online connectivity, taking back control over one’s time, energy and attention is most 
frequently reported as a secondary reason for pushback. Privacy, on the other hand, is the least frequently 
reported reason driving pushback (both as a main driver or as a secondary one). 
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4.2 Five Pushback Behaviors 


Behaviors for pushback and resistance to connectivity were overall more consistent in the literature, with 
a heavy predominance of one type of behavior: adaptation. Technical solutions, social solutions, and radical 
solutions (complete withdrawal) were less prevalent; also, a small cluster of pushback behavior is actually 
a resistance to the pushback, claiming that there is “no problem.” 


4.2.1 Behavior Adaptation: 
Manage technology use to reduce dissatisfaction 


Several adaptations to previous behaviors in relation to technology use are displayed in the literature: 
manage time (only use at specific times), manage applications (for example, drop Facebook and use only 
email, or vice versa) , digital fasting (for example, an hour/day/week of no media), and dummy accounts 
(to reduce spam or other unwanted communication). These types of behavioral adaptations are the most 
frequently cited in the literature. They are directed to responsibly managing technology use in a rational, 
more efficient, more “mindful” way that creates better life balance. After discussing why he is leaving 
Facebook, blogger Michael Dean writes where he can be found instead: 


I’m not leaving the Internet. I love the Internet. I’ve been on it since 1990 (before the World Wide 
Web), and I’m still going to be around. I just hate Facebook. You can find me on Twitter, here. 
You can find Freedom Feens, my thrice-weekly podcast with Neema Vedadi, here. You can 
subscribe to that via RSS or iTunes, and post comments on the site, and I sometimes comment 
back. You can subscribe to the torrent link here. (Dean, 2012, para. 6). 


Some prefer to choose specific times to go online, rather than choosing specific tools, and others prefer to 
have times set aside without media. These behavior adaptations are the most common ways that people 
deal with their sense of dissatisfaction caused by communication technology and information overload. The 
following are other, less frequent, forms of coping we found in the literature. 


4.2.2 Social Agreement: 
Collective decisions to limit media use 


An interesting modification of the behavioral adaptation is the social agreement: rather than individual 
change, a group agrees to use communication technology in a different (restricted) way for a certain period 
of time, often in the context of a gathering. A common example is users agreeing to turn off or put away 
their phones in a meeting or at a restaurant (and the first one to use it pays the bill!), or having restaurants 
offer a 5% discount to eat without your phone (Kim, 2012). A new trend in weddings (regular people, not 
celebrities) is to have parties “unplugged” by having guests check their phones at the door or explicitly 
request guests to turn them off (Feiler, 2013). More broadly, there are unplugging events such as the 
National Day of Unplugging, initiated by the Reboot Network, creators of The Sabbath Manifesto. Per their 
website: 


We increasingly miss out on the important moments of our lives as we pass the hours with our 
noses buried in our iPhones and BlackBerry’s, chronicling our every move through Facebook and 
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Twitter and shielding ourselves from the outside world with the bubble of “silence” that our 
earphones create. If you recognize that in yourself — or your friends, families or colleagues — join 
us for the National Day of Unplugging, sign the Unplug pledge and start living a different life: 
connect with the people in your street, neighborhood and city, have an uninterrupted meal or read 
a book to your child. (Sabbath Manifesto, 2013, Join Our Unplugging Movement, para. 2). 


4.2.3. Tech Solution: 
Trusting technology to reduce media use 


The tech solution ironically places the control in a technology solution to prevent information overload. 
Most common is the downgrade of a smart phone to a “dumb” phone. This category also includes parental 
controls over times or applications, or the use of a “kosher phone” or similar devices programmed to restrict 
content and times of use. In an increasingly common move, many people have abandoned smart phones for 
“dumb” phones. The tech solution forces the user to conform to more limited technology. For example, an 
anonymous blogger expresses the following sentiment in “Why I ditched my smartphone for a “dumbphone:” 


Smartphones are impressive gadgets that allow us to conveniently do many things and interact in 
ways that were unheard of 10 years ago....it ultimately comes down to my own personal journey 
and me trying to figure out what I want from life. Sometimes it’s good to take a step back and 
evaluate things from a wider perspective. Am I making the best use of my time and resources? Do 
I really NEED some of the things I have? When it came to my smartphone I felt like it was 
something I could — and should — do without. (Anonymous, 2011, Conclusion). 


4.2.4 Back to the Woods: 
Dropping out from technology altogether 


As an extreme reaction, some people are going completely offline, or at least adopting severely limited 
internet usage, barely minimal phone use, or both. They do it for themselves or for their families, and it 
sometimes goes unreported precisely because they are dropping out. In one example, a mom takes the family 
offline: 


With the help of her family therapist, Jindra, a single mom, devised a technology 
intervention...From that point on, there were no iPads, no computers, no television, and no Wii. 
Phones are allowed, but only when necessary. The boys did not take to this plan easily... Although 
he does want his computer time back sooner rather than later, Erik (10 years old) is enjoying this 
new lifestyle. ‘I realized there's a lot of other fun things to do. Going to the park is now nicer than 
staying inside and sitting in front of the computer for an hour.’(Berman, 2013, para. 3-5,12). 


4.2.5 No Problem: 
Whatever it takes, just take it all in 
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Finally, in an opposite reaction, some people are also reacting to pushback, claiming there is nothing wrong 
with technology and their use of it. These are critical enthusiasts without reservation. In “The Dirty Truth 
about Digital Fasts” Alexandra Samuel writes for Harvard Business Review: “If longer-term digital fasts 
can remind you how to integrate offline moments back into your daily life, that's great. But you don't need 
a digital fast to justify meeting your needs online, and you don't need to unplug in order to justify plugging 
back in” (Samuel, 2010, para. 12). 


5 Discussion and Conclusions 


While compiling the sources for this study of the literature, we did not approach the work with preconceived 
hypotheses. We began by searching for information on behavior and quickly became interested in why 
pushback was occurring, not just “how”. Searches in academic databases, (such as Academic Search 
Complete, Google Scholar, IEEE Explore, Compendex, and Google) included, but were not limited, to the 
following terms: digital fasting, technology resistance, unplugging, disconnecting, information overload, 
information anxiety, slow media, connecting versus disconnecting, digital overload, digital suicide, Facebook 
suicide, slow spaces, social media diet, digital Sabbath, over-connectedness, techno-stress. After reading 
through numerous blogs and websites, it was apparent that many of the reasons stated were emotional in 
nature, not monetary or strictly pragmatic. Emotional dissatisfaction was clearly a very strong motivation, 
distinct from external values (another motivation) because this motivation results from a failure of need 
satisfaction as a result of the user’s emotional needs not being met, apart from the moral or ethical values 
of the external value motivation. Similarly, the word “addiction” is heavily bandied about on web pages 
and blogs. Control was another repeatedly important issue reported by users. As these motivations were 
identified, this warranted a wider scholarly search for research papers and studies that encompass both 
technology resistance and user emotional response. 
The following are some areas that may warrant additional research: 


5.1 Possible Correlations between Pushback Motivations and Behaviors 


What kinds of motivations drive different types of pushback behaviors? While searching websites, blogs, 
and newspaper reporting, the common user behaviors defined by social agreement, adoption of tech. 
solutions, and behavioral adaptation became apparent. A daughter who signs a contract with her father to 
accept $200 in exchange for giving up her smart phone has entered into more of a social agreement, than a 
legal one, to limit her technology use (Gross, 2013). In “Why I ditched my smartphone for a ‘dumbphone’”, 
the user abandons a smart phone for a “dumber” flip phone and is obviously exercising a technology 
switching behavior, ie. a “tech.” solution (Anonymous, 2011). Deactivating a Facebook account, but still 
using other technology is clearly a type of limited withdrawal, a means of controlling technology by limiting 
the type of technology used regularly, in other words, a form of behavior adaptation (Jung, 2013). 


5.2 Paranoia and Privacy 


We were also surprised by the lack of concern with privacy. Both as a primary issue and as a secondary 
issue, it was not a significant concern amongst users in the literature. Addiction (or fear of addiction) and 
taking control as a motivation (which revolves around feeling of wasting time) were strong secondary issues 
for many users. In fact, concern about wasting time was as strong a concern as emotional dissatisfaction, 
though emotional dissatisfaction was expressed as a primary concern more often. It is clear from the 
breakdown of user behavior that few people are interested in forsaking technology altogether (Back to the 
Woods) or using technology to limit their usage; for example, dumbing down the phone or disabling the 
laptop’s internet capabilities. Celebrated author Jonathan Franzen has reportedly permanently disabled his 
computer so that he cannot access the internet while writing (Grossman, 2010, p. 2). From our research, 
this is an extreme and uncommon coping behavior. But the generalized lack of concern for privacy, at a 
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time when privacy is all but disappearing, is most troubling. In the words of New York Times columnist 
Ross Douthat, “ ‘Abandon all privacy, ye who enter here’ might as well be stamped on every smartphone 
and emblazoned on every social media log-in page (...) the Internet, in effect is a surveillance state” 
(Douthat, 2013). How will awareness of privacy evolve and shape people’s uses of technology and social 
media? 


5.3 Rational Behavior 


Behavior adaptation is the way that most technology users are managing their technology use when they 
are troubled by any of the five motivations identified in this literature review. That said, this is a broad 
category that encompasses a number of technology usage strategies. Essentially, this indicates that users 
are technology-friendly overall, but have decided to withdraw or limit their use of one or more types of 
technology. Given the modern inundation of technology options, a pushback to reclaim time, or avoid 
unfulfilling experiences might not be surprising. Response to technology that is only partially satisfying 
involves rational management of technology by: limiting usage, scheduling usage to limit addictive or 
compulsive behavior, or forsaking some technology altogether while still using other technology that 
provides greater satisfaction. Therefore, this is the predominant behavior, that of adaptation. What are the 
different forms of behavioral adaptation that people are exhibiting, as they learn to cope with 
communication technologies and information overload? What are the cultural and design implications of 
these shifts? 

Ever-Wasers might easily argue that the new technology is no more a problem than TV was when 
it came out and critics railed against the waste of time and mindlessness of the new entertainment. The 
difference is that entertainment is only a small part of the new landscape. Social media, smartphones, 
texting, video calling, blogging, emailing and even YouTube videos are meant to make it so much easier to 
share, connect, and create with other human beings than ever before. Instead, technology users are 
expressing a sense of loss. Virtual connection is not turning out to be as rewarding as so many of us thought 
it would be, and a growing number of people are saying “better less.” 

Having avoided online distractions for a full year away from the Internet, technology writer Paul Miller 
concluded this in his blog post “I’m still here: back online after a year without the internet”: 


I'd read enough blog posts and magazine articles and books about how the internet makes us lonely, 
or stupid, or lonely and stupid, that I'd begun to believe them. I wanted to figure out what the 
internet was "doing to me," so I could fight back. But the internet isn't an individual pursuit, it's 
something we do with each other. The internet is where people are. (Miller, 2013 para. 53). 


If technology both helps us to connect, and at the same time drives us apart, we need to learn to manage 
technology, and know when to push back. Longing for connection to people is what makes it difficult for 
users to push back on technology, what brings them back. But technology seems to overpromise and 
underdeliver in this respect. Nonetheless, it seems Pushback may also have a pushback movement. 
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1 Introduction 


Guantánamo Bay Detention Camp, located in the US naval base, Cuba, first opened its doors to those 
captured in the “Global War on Terror” in January 2002, exactly four months after the September 11, 2001 
terrorist attacks on the USA. Prior to its opening, it had already been decided by President George W. 
Bush that detainees would not be afforded rights guaranteed the Geneva Conventions, resulting in years of 
now well-documented abuses. While an abundance of literature exists on these abuses, the legal status of 
the detainees, and possible solutions to the intractable problem of what to do with the detainees after 
Guantánamo Bay, there appears to be no analysis from the library and information science field detailing 
the denial of information access and its effects on the detainees. 

Offered to give context and comparison for the subsequent discussion of information access rights 
and realities for detainees at Guantánamo Bay, first comes an analysis of information access rights afforded 
to prisoners held by the US military, either as enemy prisoners of war, civilian internees, or members of the 
US military, and prisoners held in ADX Florence, the only federal “super-maximum security” (“supermax”) 
prison. The analysis of Guantánamo Bay could apply to any other facility the US has utilized for the extra- 
legal detention of non-US citizens during the War on Terror, such as Abu Ghraib, Bagram, and several 
other “secret” prisons. Guantánamo Bay was chosen for its notoriety and the relative wealth of information 
that exists about day-to-day operations and conditions in comparison to any other such facility. After a 
thorough examination of information access at Guantánamo Bay, the effects of limiting information access 
are considered in relation to both detainee wellbeing and to the previously positive reputation of the United 
States as a global leader in human rights. As President Barack Obama continues to state his commitment 
to close Guantánamo Bay, information access rights need to be considered as an important part of any 
future detainment arrangements. First, however, it is essential to define information for the purposes of the 


analysis that follows. 
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2 What is Information? 


The definition of information matters in the context of information rights and this paper: without a clear, 
common understanding of what constitutes information, it is impossible to define information rights and 
therefore to evaluate whether those rights are being upheld. 

The Oxford English Dictionary (“OED”) defines information as “knowledge communicated con- 
cerning some particular fact, subject, or event; that of which one is apprised or told; intelligence, news.” 
Buckland’s (1991) oft-cited three categories divides information into information-as-process, which is the 
act of informing, of communicating, of changing one’s knowledge, information-as-knowledge, being the in- 
formation itself that is perceived, and information-as-thing, those objects that contain information such as 
data and documents. Ruben (1992) categorizes information into three orders: first-order is “environmental 


pi 


artifacts and representations; environmental data, stimuli, messages, or cues;” second is “internalized ap- 
propriations and representations” or “semantic networks, personal constructs, images, rules or mind;” and, 
third-order information is “socially constructed, negotiated, validated, sanctioned and/or privileged appro- 
priations, representations, and artifacts” or the social context of information, for example books, newspa- 
pers, and letters (pp. 22-24). 

For the purposes of this paper, “information” is the OED’s inclusive definition that is given further 
categorization by Buckland and Ruben. Information, as applied to the inmates and Guantanamo Bay de- 
tainees examined in this paper, is not only books, newspapers, letters, and telephone calls, but also the 
spatial and geographical environment — the sights, smells, sounds — surrounding them, the education and 
knowledge gathering process, communicating to and receiving communication from others, and the ability 
to utilize and enjoy one’s own mind and cognitive abilities. 


3 What are Information Access Rights? 


Information access rights are a small part of the spectrum that is human rights: “it is universally believed 
now, but not necessarily practiced, that access to information is everybody’s right” (Smith, 1995, pp. 169- 
170). In this vein, the US Supreme Court in Martin v. Struthers (1943) extended the First Amendment 
right of free speech to include the right to receive information. However, detainees held in the US Naval 
Base at Guantanamo Bay are beyond the protections of the US Constitution because of the base’s location 
within Cuba’s sovereign territory. Nevertheless, the US has adopted, ratified, and/or is a signatory to 
international treaties that impose a requirement of access to information, all of which should apply to 
detainees irrespective of their geographical location. 


3.1 Information Access as a Basic Universal Human Right 


McIver, Birdsall, and Rasmussen (2003) argue that the right to communicate is a basic universal human 
right, quoting Fisher’s (1982, p. 8) holding that the right to communicate “springs from the very nature of 
the human person as a communicating being and from the human need for communication, at the level of 
the individual and of society.” Sturges and Gastiner (2010) argue “individuals need a broad and self-selected 
set of skills across the range of formats and media to support their human right to information” (p. 200) 
and consequently it is a governmental responsibility to foster these skills “to ensure that people have the 
skills to make best use of the rights that Article Nineteen offers” (p. 199). The US Government through its 
military command of Guantanamo Bay has the responsibility to create an environment whereby information 
rights can be freely exercised. 


3.2 International Law and its Relation to the US Constitution 


Several articles of the Universal Declaration of Human Rights (“UDHR”) (United Nations, n.d.), concern 
information access rights. The most obvious, Article 19, provides “[e]veryone has the right to freedom of 
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opinion and expression; this right includes freedom to hold opinions without interference and to seek, receive 
and impart information and ideas through any media and regardless of frontiers.” 

Article 19 of the International Covenant on Civil and Political Rights (““ICCPR”) also provides 
that there is “freedom to seek, receive and impart information and ideas,” stating that the right to freedom 
of opinion and expression can only be restricted if provided for by law and necessary to protect the rights 
of others or for the protection of national security or public order, health, or morals (United Nations, 1966). 
This echoes Article 12, UDHR, which states “[n]o one shall be subjected to arbitrary interference with his 
privacy, family, home or correspondence” (United Nations, n.d.). 

Article 26, UDHR provides “everyone has the right to education” (United Nations, n.d.). Infor- 
mation access is an essential corollary to this right in order for Buckland’s (1992) information-as-process to 
be fully realized. 

Article 5, UDHR, and Article 7, ICCPR, as well as the Convention Against Torture and Other 
Cruel, Inhuman or Degrading Treatment or Punishment (United Nations, 1984), provide that no one shall 
be subject to torture or cruel, inhuman or degrading treatment or punishment. The US government took 
exception to these articles of these treaties and elected to defer to the current standards in the Eighth 
Amendment to the US Constitution, with the additional result that a private cause of action would not be 
created in the US courts for those seeking redress. Unsurprisingly, there is little case law concerning whether 
“enhanced” interrogation techniques, for example hooding and sleep deprivation, or conditions of detention, 
such as solitary confinement or prohibited access to the outside world, as experienced by detainees in 
Guantánamo Bay, violates the Eighth Amendment. 

The Human Rights Committee of the United Nations held in Floyd Howell v. Jamaica (1998) that 
prolonged solitary confinement and incommunicado detention, and of course physical violence and threats 
of torture, violate Article 7’s prohibition against cruel, inhuman or degrading treatment or punishment. 
Unquestionably these practices result in the severe denial of access to Ruben’s (1992) first-order environ- 
mental information. However, the test used by US courts to determine whether these actions might violate 
the Eighth Amendment — whether the government has a good faith legitimate governmental interest, and 
did not act maliciously and sadistically for the purpose of causing harm (Hudson v. McMillian, 1992) — 
will undoubtedly always find in favor of the government in all but the worst cases of abuse. 

The Sixth Amendment to the US Constitution is reflected in Article 9(2), ICCPR whereby “anyone 
who is arrested shall be informed, at the time of arrest, of the reasons for his arrest and shall be promptly 
informed of any charges against him” (United Nations, 1966). Guantánamo Bay detainees, denied any rights 
under the US Constitution, should retain the right to receive information concerning the reason behind 
their detention. 

Ruben’s (1992) second-order information — the individual “system,” personal constructs, and the 
mind — can be found in Article 18 of both the UDHR and ICCPR: “everyone has the right to freedom of 
thought, conscience and religion” (United Nations, n.d., 1966). Guantanamo Bay detainees have the right 
to their thoughts, their internal processes, and their beliefs, without interference, either direct or indirect, 
from the government. 

As a signatory to the UDHR and ICCPR, the US government provides for information access rights 
in principle. What follows now is an examination of the US government’s outcomes of its practice and policy 


concerning information access rights. 


4 Information Access at US Federal Supermax Prison 


The Federal Bureau of Prisons has one “administrative maximum facility” or “supermax” prison in its 
system: ADX Florence, Colorado. As of May 9, 2013, it housed 442 convicted felons (Federal Bureau of 
Prisons, 2013), including “shoe bomber” Richard Reid, mastermind of the 1993 World Trade Center bomb- 
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ing Ramzi Yousef, and Oklahoma City bomber Timothy McVeigh’s partner, Terry Nichols (Soylent Com- 
munications, 2012). Most have decried the conditions at ADX Florence: a former warden called it a clean 
version of hell (Theoharis & Sassen, 2012); Hanson (2011) describes it as “place that is a symbol of all 
things bad, evil, and corrupt; an enormous monument to the ills of humanity set against a Rocky Mountain 
backdrop;” and the UN Committee Against Torture in 2000 and 2006 expressed concern about the periods 
of isolation prisoners endure in US supermax prisons (Hanson, 2011). Prisoners are separated from one 
another by thick concrete walls and steel doors, spend their two to five times per week outdoor recreation 
time in cages, and have only a single view of the sky and roof through their 4 inch by 4 foot windows 
(Theoharis & Sassen, 2012; Hanson, 2011). 

Prisoners held in ADX Florence, and dozens of other of state-run supermax prisons, are being held 
in facilities specifically designed to maintain extended periods of sensory deprivation (Stoelting, 2012, p. 7). 
The average stay in California’s Pelican Bay State Prison Secure Housing Unit (“SHU”), a special detention 
unit within that supermax prison, is eight years (Reiter, 2012). The majority of those in ADX Florence are 
incarcerated for life without the possibility of parole. Every prisoner in a state or federal supermax is a 
convicted felon. 

Despite these conditions, inmates at ADX Florence are permitted to send and receive correspond- 
ence and access publications, can make telephone calls one to two times per month, and arrange family 
visits. Each cell contains a television showing religious, educational, and entertainment programming and 
in-cell educational programs are available. Inmates not subject to disciplinary sanctions are permitted five 
visits of up to seven hours each per month. (Federal Bureau of Prisons, 2008). Their information access 
rights are, in principle, upheld within the parameters of a supermax facility and its heightened security 
concerns. 

However, once an inmate receives a sanction for a disciplinary infraction, what little contact they 
do have other others is stripped away (Stoelting, 2012, p. 1). For example, those in Pelican Bay’s SHU find 
themselves in “Privilege Group D” and are no longer allowed family visits or telephone calls. Even if an 
inmate is permitted family visits, supermax prisons tend to be prohibitively far away from family members. 
Inmates, such as those who successfully filed a lawsuit against California Governor Jerry Brown in Ashker 
v. Brown (2013), allege that prolonged solitary confinement (for these inmates, 10 to 28 years) violates 
Eighth Amendment prohibitions against cruel and unusual punishment, especially in light of the absence of 
meaningful review. In practice, inmates’ information access rights, especially to Ruben’s first-order environ- 
mental information, are being violated. 

Prisoners in supermax prisons located on sovereign US soil are protected by the US Constitution 
and inmates have potential redress in the courts. However, most are not able to mount a successful challenge 
(for examples, see Stoelting, 2011). Many inmates can — at least in theory — work and/or behave their 
way out of complete isolation. However, the policies behind conditions and regulations found in US super- 
max prisons prevent most inmates from earning transfer to a less restrictive facility. The inability for an 
inmate to earn a reduction of restrictions has the concomitant effect that access to information of all kinds 


remains severely restricted. 


5 Information Access at US Military Prisons 

Field Manual No. 3-19.40 (“FM 3-19.40”) titled Military Police Internment/Resettlement Operations “de- 
picts the doctrinal foundation, principles, and processes that MP [military police] will employ when dealing 
with enemy prisoners of war (EPWs), civilian internees (CIs), US military prisoner operations, and MP 
support to civil-military operations” (Department of the Army, 2001). FM 3-19.40 details the protections 
afforded to EPWs and CIs under the Geneva Conventions and that the “basic US policy underlying the 
treatment of detainees and other captured or interned personnel during the course of a conflict requires and 
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directs that all persons be accorded humanitarian care and treatment from the moment of custody until 
their final release or repatriation” (p. 1-12). 


5.1 — Civilian Internees (“Cls”) 


A CI is a non-military person considered a security risk, or someone who has committed an offense (insur- 
gent, criminal) (Department of the Army, 2001, p. 1-3). They are protected under the Geneva Convention 
Relative to the Protection of Civilian Persons in Time of War (“Geneva IV”), protections which are reflected 
in the text of FM 3-19.40. CIs have the right to receive the rules and regulations under which they are 
interred in a language they can understand, to send and receive correspondence, and to have dependent 
children interred with them in order to keep families together. Verbal instructions and orders should be 
given to the CIs in their native language, as should all notices and announcements to “ensure information 
is easily accessed” (Department of the Army, pp. 5-10-5-11). Any disciplinary matters should be conducted 
with a translator present, having given the CI precise information about the allegation and an opportunity 
to defend it. CIs that are confined due to a disciplinary matter are still entitled to 2 hours of time in the 
open air daily and to send and receive letters, cards, and telegrams. Social, recreational, and educational 
programs are encouraged, visits by close relatives are permitted if possible within the country of interment, 
and religious freedoms are fully respected. Medical and dental care, including psychiatric treatment, is 
provided as needed. (Department of the Army, 2001, pp. 5-1-5-18). 


5.2 Enemy Prisoners of War (“EPWs”) 


Under the Geneva Convention Relative to the Treatment of Prisoners of War (“Geneva III”), an EPW is 
a member of an enemy armed force, or a member of a militia, volunteer corps, or civilian accompaniment 
that forms part of an enemy armed force (International Committee of the Red Cross, 1949a). Many of the 
same rules and regulations regarding the treatment of Cls apply to EPWs, such as receiving both written 
and spoken rules, regulations, orders, and notices in a language that can be understood by the EP Ws, being 
fully informed of any disciplinary actions against them, allowing free exercise of religion, and providing all 
medical care as needed. Detained EP Ws are represented by a senior EPW officer or elected representative, 
who has broad permission to communicate with outside groups, such as the International Commission of 
the Red Cross and US military authorities. EP Ws are provided with a limited amount of free outgoing, and 
unlimited incoming, correspondence, may send and receive telegrams, but cannot make telephone calls. 
(Department of the Army, 2001, pp. 4-1—4-24). 


5.3. US Military Prisoners 


The Army Corrections System (“ACS”) is outlined in FM 3-19.40, but receives detailed attention in Army 
Regulation (“AR”) 190-47 (Department of the Army, 2006). ACS facilities are classified by three levels of 
security, with the United States Disciplinary Barracks (“USDB”) at Fort Leavenworth, Kansas, providing 
maximum-security, long-term incarceration for up to 515 of those convicted of the most serious crimes, 
irrespective of the branch of the US military in which a prisoner served (p. 3). According to the USDB 
website: 


“Correctional and treatment programs consist of individual and group counseling for self-growth 
and crime specific, education classes, and vocational training. Vocational training certificates are 
offered in barbering, carpentry, embroidery, engraving, graphic arts, laundry/dry cleaning, printing, 
sheet metal, and welding.” (U.S.D.B., 2012b) 


Welfare activities include recreation time, access to reading materials, and retention of personal letters and 
photographs, books and magazines, and textbooks. A comprehensive library is available, including legal 
resources and a “varied and authoritative collection of reading material aimed at encompassing the various 
reading levels, interests, and cultural backgrounds of the prisoners confined” (Department of the Army, 
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2006, p. 13). Family visitations are encouraged, provided that an inmate’s family members can afford to 
make the trip to Kansas, but once there, minimal physical contact is allowed between visitors and prisoners 
except under extraordinarily special circumstances (U.S.D.B., 2012a). 

Regulations concerning correspondence are permissive, including limited free mail for inmates not 
receiving pay for prison work, with stationery being free from any indication that the inmate is confined 
(Department of the Army, 2006, pp. 41-42). However, letters must be written in English if at all possible 
(p 41). Telephone communication is permitted (p. 42). AR 190-47 also states that military prisoners are 
entitled to legal assistance, as well as access to a law library, and are kept “informed concerning the status 
of their cases or sentences and other pending legal matters” (p. 13). Mental health support is provided 
through a multi-disciplinary team (p. 17). 

Information access principles for those held in military prisons, whether a CI, EPW, or an inmate 
of the USDB, is generally permissive, especially in comparison with US supermax prisons. Access to infor- 
mation appears to be restricted only by legitimate practical concerns such as a security and space, rather 
than as a matter of principle. The outcome of the above policies support the ability of internees and inmates 
to access Ruben’s (1992) second-order internal information, constructs, and mental processes, as well as 
Ruben’s first-and third-order information. 


6 Brief History of Guantanamo Bay Detention Camp 


Following al Qaeda’s September 11, 2001 terrorist attacks on the USA, and as part of the United States 
Military’s subsequent Operation Enduring Freedom waged in Afghanistan, President Bush issued a military 
order on November 13, 2001 (Bush, 2001), concerning the detention, treatment, and trial of any non-US 
citizen captured in the War on Terror. The order required that detainees be “treated humanely” (Bush, 
2001). Two months later on January 11, 2002, the first twenty detainees arrived at Camp X-Ray, Guanta- 
namo Bay Naval Base, Cuba, a detention camp formally used to house Haitians and Cubans who had 
attempted to sail to Florida in the mid-1990s (Cable News Network, 2002). By the end of that month, 156 
detainees found themselves housed in the outside cells at Camp X-Ray (..kman et. al., 2013). By April 2002, 
prisoners were transferred to the purpose built Camp Delta (now renamed Camp 1), comprised of 200 open- 
air, steel mesh cells and an outdoor exercise area (United States, 2009, p. 11). 

As the war progressed and more detainees arrived at Guantanamo Bay, additional camps were 
added. Camp 2 and 3, similar in configuration to Camp 1, opened in October 2002. Camp 4, opened in 
February 2003, was designed as a communal living camp, complete with recreation, education, and enter- 
tainment, for Guantánamo Bay’s most compliant detainees. As of November 2012, Camps 1 through 4 were 
no longer in use (United States Government Accountability Office, 2012, p. 15). 

Camp 5, added in April 2004, and Camp 6, opened in December 2006 were both modeled after US 
supermax prisons. Camp 5 cells have a clear window allowing natural light to enter, whereas Camp 6 cells 
receive sunlight through skylights. Both facilities have adjoining recreation yards, with Camp 5 having 
additional open-air, steel mesh cells for non-compliant detainees. Camp 6 has now been renovated to a 
shared housing unit. Camp 7 houses the so-called “High-Value Detainees” transferred from the CIA in 
September 2006 in a maximum-security, climate-controlled facility with individual recreation cages. Camp 
Iguana, originally housing child detainees, now holds pre-release detainees — those no longer classified as 
an enemy combatant — in a communal facility with free movement around several buildings containing 
living quarters, religious facilities, a library, and laundry facilities. (United States, 2009, pp. 11-13). Camp 
Echo, holding compliant detainees that would otherwise be at risk in Camp 6, consists of 10 wooden hut- 
like structures, with each detainee having their own hut (United States Government Accountability Office, 


p21). 
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At its peak in June 2003, Guantanamo Bay held 684 detainees, with a total of 779 individuals 
having been sent there (Scheinkman et. al., 2013). Since then, detainees have been slowly released or trans- 
ferred to facilities in other countries, and as of November 20138, only 164 detainees remain at Guantanamo 
Bay (Scheinkman et. al., 2013). Over half of these have been cleared for release, but cannot return home, 
due to the risk of inhumane treatment if repatriated, or refusal by their country of citizenship to take them 
back. Others have simply not been released (Clive Stafford Smith, personal communication, May 9, 2013). 
46 detainees are being held indefinitely, without having ever received a hearing of any kind (Shiffman, 
2013). 


6.1 Why Guantanamo Bay? 


The reasoning behind holding those captured in the War on Terror at Guantanamo Bay rather than facilities 
on sovereign US soil has been continually questioned since late 2001. Perhaps it was, initially, merely a 
misguided attempt to keep the US mainland more secure in the face of previously unimaginable terrorist 
threats. However, Dratel (Greenberg & Dratel, 2005) argues that there can only have been “pernicious 
purposes designed to facilitate the unilateral and unfettered detention, interrogation, abuse, judgement, and 
punishment of prisoners” namely to put detainees beyond courts, the law, the United States Constitution, 
the Geneva Convention, and to absolve the US government of any liability for war crimes of those involved 
(p. xxi) with the process. Whichever of these viewpoints is true, the result was and still remains the holding 
of several hundred detainees beyond the reach of the law and finds them subjected to treatment which, by 
the standards found in every other prison system in America, would be considered harmful and abusive. 


6.2 What Rights Ought to be Afforded to Guantanamo Bay Detainees? 


Detainees in Guantanamo Bay are not protected by the US Constitution. Based on US Department of 
Justice memos (Philbin & Yoo, 2001; Yoo & Delahunty, 2002; Bybee, 2002), President Bush declared that 
Geneva III did not apply to those captured during the War on Terror because, as members of the fighting 
force of al Qaeda and the Taliban, both non-signatories of the Geneva Conventions, they not could not be 
classified as Prisoners of War (Bush, 2002). 

Seemingly forgotten were provisions in Geneva IV applicable to any person captured in hostilities 
which, although allowing for temporary suspension of certain rights if “prejudicial to the security of such 
State,” guarantees humane treatment and a fair and regular trial should one be held (International Com- 
mittee of the Red Cross, 1949b, Article 5). In 2006, this sorry state of affairs was finally overturned in 
Hamdan v. Rumsfeld, which held that detainees were entitled to the minimum protections afforded by 
Geneva III. By 2008, in Boumediene v. Bush, detainees received the constitutional right of habeas corpus, 
but no additional constitutional rights were afforded them or have been since. 

Irrespective of the US government or courts, “[p]ersons have a moral duty to respect human rights, 
a duty that does not derive from a more general moral duty to comply with national or international legal 
instruments,” (Pogge, 2000, p. 46). Pogge (2000) continues: human rights “express weighty moral concerns, 
which normally override other normative considerations ... all human beings have equal status” (p. 46, 
emphasis original). The US government, its departments, and its agents have a duty separate from any 
legislative duty to uphold the human rights, including information access rights, of all people its actions 
affect. 


7 Information Access for Guantanamo Bay Detainees 

In the early days of the War on Terror, the conditions for Guantánamo Bay detainees were shrouded in 
secrecy, the exception being some leaked photographs and reports published in the media. Even less was 
known about the conditions suffered by those held in Abu Ghraib, Bagram, and an unknown number of 
secret prisons. The conditions experienced by detainees during this period are now known to have included 
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widespread cruel and inhuman treatment and torture. Recent reports indicate that conditions are slowly 
improving, but questions remain as to whether they have improved enough to no longer be considered a 
violation of the detainees’ human rights. 


7.1 Cell Conditions 


The Center for Constitutional Rights (“CCR”) (2009) reports that “solitary confinement, sensory depriva- 
tion, environmental manipulation, and sleep deprivation are daily realities for these men” (p. 3). Many 
detainees are confined to their cells, often with the air conditioning making it too cold, for at least 20 hours 
per day (p. 7). 

Camp 6 has no outside facing windows and Camp 5 cells only have thin opaque slit windows. Some 
detainees have reported that even these small windows were sometimes painted over (Human Rights Watch, 
2008, p. 48) but currently, detainees are only held in cells with clear windows (United States, 2009, p18). 
Outside recreation time is not guaranteed to be during daylight hours, with the result that some detainees 
never see the sun (Center for Constitutional Rights, 2009, p. 5). Cells are illuminated around the clock with 
fluorescent lighting (Council of Europe, 2007, p. 26; Center for Constitutional Rights, 2009, p. 7), which 
some detainees report causing them vision problems, even blindness (Human Rights Watch, pp. 30, 35, 39). 

Constant noise has also been reported by detainees. The construction materials used in Camp 6 
amplify noise (United States, 2008a, p. 35) and the now-closed Camp 3, which served as a punishment unit, 
was next to constantly-running and loud machinery (Human Rights Watch, p. 8). Communication in Camp 
3 was impeded by this noise, and further limited by housing detainees apart from one another. In Camp 5 
and 6 communication is restricted by thick walls and steel doors. 

The inability to communicate due to prolonged solitary confinement and incommunicado detention 
violates Article 19, ICCPR and Article 12, UDHR, and has been declared by the Human Rights Committee 
of the United Nations as a violation of Article 5, UDHR’s and Article 7, ICCPR’s prohibition against cruel, 
inhuman or degrading treatment or punishment. Not only is access to Ruben’s first-order environmental 
information and third-order social information severely curtailed, but the resultant psychological problems 
that stem from these conditions deny detainees access to their own fully-functioning mind, Ruben’s second- 
order information. Dr. Daryl Matthews, a forensic psychologist at the University of Hawaii, found that “the 
complete loss in control over their [the detainees’] daily lives has resulted in profound depression and Post 
Traumatic Stress Disorder” (United States, 2008a, p. 35). Numerous studies have found that the “absence 
of social and environmental stimulation has been found to lead to a range of mental health problems, 
ranging from insomnia and confusion to hallucinations and psychosis” (Human Rights Watch, 2008, p. 20), 
which compounds the inability to access second-order information. 

In Camp Echo and Camp Iguana, which function as communal living facilities, detainees do not 
suffer from many of the inadequacies of the other Guantanamo Bay detention facilities. In 2009, the De- 
partment of Defense (“DoD”) reported that modifications were being made to convert Camp 6 into a 
communal living facility (United States, 2009, p. 12), but according to Clive Stafford Smith, Camp 6 has 
been returned to “solitary isolation” (personal communication, May 9, 2013). 


7.2 Freedom of Religion 


The CCR, (2009) reports that detainees “suffer from religious humiliation and the inability to engage in 
religious practices” (p. 12). Detainees in non-communal camps are not allowed to pray face-to-face, but 
instead have to conduct their prayer calls, which require communication between the designated prayer 
caller and the other detainees, through open feed tray slots (Khan Tumani v. Obama, 2009, § 8). In those 
cellblocks where there is extremely loud background noise, the detainee’s right to practice religion through 
communal prayer is violated completely. In addition to violating the right to communicate, Article 18 of 
both the UDHR and ICCPR, which grants the right to freedom of religion, is also violated. 
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7.3. Correspondence With and News from the Outside World 


In April 2002, Amnesty International reported that many of the Guantanamo Bay detainees had not been 
able to contact their families and inform them of their whereabouts, nor had their families been able to find 
out any information about them from the appropriate authorities (pp. 23-28). According to Williams (2002), 
detainees were able to send out a postcard to their family upon their arrival and thereafter allowed six 
pieces of mail per month. However, the Human Rights Watch (2008) states that detainees reported denial 
of access to writing implements (p. 48). Currently, detainees are able to send mail through ordinary mail, 
the amount of which is unrestricted for compliant detainees, and unlimited Red Cross Messages during Red 
Cross quarterly visits (United States, 2009, p. 36). 

When mail is sent or received, it is censored using a process that takes 60 days (United States, 
2009, p. 36). Clive Stafford Smith reports that “they censor the most ridiculous things ... one of things I’ve 
been doing just for my entertainment is to get the original of what someone’s written. So, for example, if 
one of the detainee’s twelve year old children writes him a letter, then we keep the original of it and then 
we'd see what gets through to him” (personal communication, May 9, 2013). 

Family telephone calls, at the rate of one per year, with the only exception being in the event of a 
relative’s death, have been permitted since 2009, but are heavily monitored (Center for Constitutional 
Rights, 2009, p. 14). Detainees in Camp 7 are not authorized to make telephone calls under any circumstance 
(United States, 2009, p. 34). For detainees with little to no reading and writing literacy, or families with a 
similar skill gap, telephone calls and personal visits are the only way to communicate. Only one family visit 
has ever occurred: in the case of Australian David Hicks, the authorities reportedly allowed a single, 15- 
minute visit just prior to the beginning of his trial (Higham, 2004). Heavily censored newspapers are now 
available to complia 

nt detainees and news is accessible on radio and television for those permitted access (United States, 
2009, p. 34). 

Detainee access to information about their family and the larger outside world is severely curtailed. 
This curtailment is in violation of Article 72, Geneva II. Article 19, UDHR rights to freedom of expression 
and to seek, receive and impart information are violated by the censorship of both incoming and outgoing 
information, even taking into account that Article 19, ICCPR provides for restrictions if necessary to protect 
national security. The lack of information and communication access prevents detainees from playing a 
meaningful part in their family lives, which is a detriment to their own health and wellbeing, as well as 
their family’s. 


7.4 Enrichment Activities 


Article 38, Geneva III provides that EPWs must be given opportunities to participate in intellectual and 
educational pursuits (International Committee of the Red Cross, 1949a). Access to reading materials has 
slowly improved over the 11 years that Guantánamo Bay has been open. All detainees are allowed books 
from the library, which in 2009 contained more than 13,000 titles, 900 magazines, and 300 DVDs (United 
States, 2009, p. 34). Non-compliant detainees are allowed to access three books and one magazine at any 
given time, those in Camps 5, 6, and 7 are allowed four books and two magazines, and those in Camps 
Echo and Iguana are allowed five books, three magazines and one personal DVD, with items being distrib- 
uted weekly (United States, 2009, p. 34). This is a vast improvement from just one year earlier when 
detainees in Camp 3 were “allowed a Koran in their cell but virtually nothing else” (Human Rights Watch, 
2008, p. 8). 

The images on Charlie Savage’s tumblr photo blog, “Guantanamo prison library books for detain- 
ees,” depict available materials being predominantly in English, with Arabic speakers also having a less 
limited selection. Human Rights Watch (2008) reports that “several lawyers for non-Arabic-speaking pris- 
oners have complained that, at least in the past, their clients have had very inadequate access to books in 
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a language they can read” (p. 17). As of November 2013, the author is waiting for a substantive response 
to a Freedom of Information Request for US Southern Military Command to produce all records relating 
to the contents the library at Guantanamo Bay. 

Other than those in Camp 7, detainees that are compliant can take native-language literacy classes, 
as well as classes in English and art (United States, 2009, p. 34). Cable television and computer games are 
available in the communal living camps (United States, 2009, p. 34; Isikoff, 2013) and Camp 5 contains two 
media rooms, one on each tier (United States Government Accountability Office, p. 17). If Camp 6 has 
been recently been converted to its originally intended medium-security status, it includes a new outdoor 
recreation yard and a media center with television and DVD player (United States, 2009, p. 11; Isikoff, 
2013). However, Smith states that Camp 6 has instead been returned to a maximum-security facility (per- 
sonal communication, May 9, 2013) and therefore it is unlikely that these amenities are being made available 
to detainees. 

There appears to be problematic access to intellectual and educational information, especially for 
detainees in Camp 7, who are being denied their Article 26, UDHR right to an education. With no in-cell 
radio or television, those who have limited or no reading or writing skills may find it exceptionally hard to 
fill their time, particularly if they are a non-compliant detainee on lock-down, as their access to information 
is seriously limited. This limitation can aggravate any psychological problems detainees may have due to 
isolation and/or lack of communication. 


7.5 Participation in the Legal Process 


“Everyone — even a person suspected of the worst possible crimes — has the right not to be questioned 
without his or her counsel being present and before being informed of his or her rights in a language which 
he or she understands” (Amnesty International, 2002, p. 28). Amnesty International reports that detainees 
were not receiving this information, nor are they informed of the charges against them (Amnesty Interna- 
tional, 2002, pp. 28, 40). Many detainees, after years of detention, have still have not been informed of the 
charges against them and are being held without charge (Clive Stafford Smith, personal communication, 
May 9, 2013). This is in direct conflict with Article 105, Geneva III (International Committee of the Red 
Cross, 1949a). 

Prior to Rasul v. Bush (2004), the US government withheld access to counsel from Guantánamo 
Bay detainees and, in many cases, held them incommunicado from the outside world (Driscoll, 2006, p. 
891). Following the decision in Rasul, which provided for attorney access to detainees, a protective order 
was issued in In re Guantanamo Detainee Cases (2004) prohibiting attorneys of detainees from sharing 
classified information with their clients. Classified information is “anything written or oral that the govern- 
ment has in its possession or has ever had in its possession that it marks as classified or tells the attorney 
is classified” (In re Guantanamo Detainee Cases, 2004, pp. 176-177; Denbeaux & Boyd-Nafstad, p. 500). 
This includes “most of the information relating to the facts of the client's detainment and information 
necessary to defend the client” (Denbeaux & Boyd-Nafstad, p. 500). 

Additionally, the 2004 protective order requires all legal mail sent from counsel to detainees pass 
through a “privilege team,” who redacts any potentially classified information (In re Guantanamo Detainee 
Cases, 2004, pp. 180). Counsel’s laptops, cell phones, cameras, and voice recorders are prohibited during 
client visits — visits for which the attorney must seek security clearance and give 20 days advance notice. 
Any handwritten notes taken by attorneys during the visit must be surrendered before leaving Guantánamo 
Bay and are sent to the privilege team in Washington, DC (In re Guantanamo Detainee Cases, 2004, pp. 
191; Clive Stafford Smith, personal communication, May 9, 2013). Any information redacted by the privilege 
team is prohibited from use by the attorney in legal papers or proceedings, although the attorney may visit 
the privilege team in person to re-read redacted sections of their notes (Clive Stafford Smith, personal 
communication, May 9, 2013). 
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Although telephone calls between attorneys and detainees are permitted, requests for calls must be 
made 15 days in advance and are strictly monitored as they happen (Clive Stafford Smith, personal com- 
munication, May 9, 2013). Requests for both expedited telephone calls and visits are possible, but Knefel 
(2013) reports that these requests are denied even in dire situations such as widespread hunger striking. As 
recently as May 10, 2013, Smith (2013) tweeted that two of his clients refused his telephone calls, the 
implication being that it was due to concerns of censorship. 

Guantanamo Bay detainees are being denied access to information about their own legal status, 
their rights, and, due to censorship of their attorneys’ papers, the legal process. The denial of information 
prevents detainees from participating in their own defense or to mount a legal challenge to their ongoing 
detainment. Denbeaux et. al. (2006) conclude that critical information about a detainee’s legal status — 
that he was about to be released to his homeland — might have restrained him from committing suicide on 
June 10, 2006 (p. 2). Although detainee access to legal information has improved since Guantánamo Bay’s 
inception, the US government continues to practice the principle of secrecy and censorship with little regard 
for the effects of this policy on detainees. 


7.6 Do Interrogations Violate Information Access Rights? 


The methods of interrogation sanctioned during the first few years of Guantanamo Bay included yelling, 
deception, isolation and segregation, light deprivation, stress induction, and 20-hour long sessions (Green- 
berg & Dratel, 2005, p. 1239). Many detainees reported use of non-sanctioned methods in addition to those 
approved for use, such as beating, hooding, stress positions, sexual humiliation, and use of un-muzzled 
military dogs (United States, 2008a, p. 28, 33; United States, 2010, p. 41). Many of these methods directly 
inhibit access to information, particularly regarding environmental stimuli and communication with others. 
However, Ruben’s second-order internal information would be severely obstructed due to the psychological 
stress that results from these interrogation techniques. 

Presently, “[ajll interrogations are voluntary; approximately one-third of the sessions are at detain- 
ees’ request” (United States, 2009, p. 15). Interrogators can provide incentives for cooperation and no basic 
comfort items are taken away for failure to answer questions (United States, 2009, p. 62). According to the 
DoD, “significant changes made in Guantanamo have moved interrogation practices far beyond the mini- 
mum standards articulated in Common Article 3 [Geneva III], U.S. law and DoD regulations” (United 
States, 2009, p. 63). 


8 Conclusion 


Dratel (Greenberg & Dratel, 2005) argues that “/l|awyers and public officials need to be instructed ... to be 
cognizant of the real-life consequences of their policy choices” (p. xxiii). The real life consequences of US 
government policy in the War of Terror has led to restrictive access to information in all three of both 
Buckland’s (1991) and Ruben’s (1992) categorizations. For many types of information access, the realities 
of detainment in Guantánamo Bay equate to human rights violations. 

Human rights violations have caused one attorney to remark that his client was “slowly but surely 
slipping into madness” (United States, 2008a, p. 35). Hunger strikes, in protest of conditions, have been a 
common occurrence at Guantánamo Bay through the years. At its peak during June 2013, there were 106 
total hunger strikers out of 166 detainees. By July 2013, 46 were being tube fed (Gamio & Rosenberg, 2013), 
both in violation of international standards set out by the World Medical Association and the United 
Nations (France-Presse, 2013). 

The reputation of the US government has suffered as a result of their Guantanamo Bay policies. 
The government has the responsibility to ensure the human rights of Guantanamo Bay detainees are upheld: 
“[t]he panic-laden fear generated by the events of September 11th cannot serve as license ... to suspend our 
constitutional heritage, our core values as a nation, or the behavioral standards that mark a civilized and 
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humane society” (Greenberg & Dratel, 2005, p. xxiii). The Parliamentary Assembly of the Council of Europe 
(2005) has the following opinion: 


“The United States has long been a beacon of democracy and a champion of human rights through- 
out the world and its positive influence on European development in this respect since the Second 
World War is greatly appreciated. Nevertheless, the Assembly considers that the United States 
Government has betrayed its own highest principles in the zeal with which it has attempted to 
pursue the “war on terror”. These errors have perhaps been most manifest in relation to Guanta- 


namo Bay.” 


President Barack Obama included the closing of Guantánamo Bay as a prominent issue during his 2008 
election campaign, but has so far been unable to achieve this goal, partly due to a number of statutes 
prohibiting the transfer of detainees to the United States (The White House, 2013). In a November 2012 
report, the United States Government Accountability Office examined the options for bringing Guantanamo 
Bay detainees to the mainland US, either to DoD or federal BoP facilities. Although this report was hypo- 
thetical at the time of its writing, it provided President Obama with the opinion that the BoP has the 
“correctional expertise to safely and securely house detainees with a nexus to terrorism” (p. 48). 

On April 30, 2013, the President stated that Guantánamo Bay “is a lingering problem that is not 
going to get better. It’s going to get worse. It’s going to fester” (The White House, 2013). Guantanamo Bay 
must be closed if continuing human rights violations, including the violation of information access rights, 
are to be halted. The President renewed his commitment to close Guantanamo Bay and stated he will do 
all he can administratively to achieve this, but that Congress will have to intervene to make the final closing 
a reality (The White House, 2013). It is hoped, for the sake of detainees’ human rights, this will happen. 
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Abstract 

Wikipedia is an open collaboration, global, multilingual project. Its guidelines and policies direct the 
collaboration process into a vision of objective and neutral encyclopedic knowledge. However, coherence 
of that knowledge, and the outcomes of the collaborative process on the same topic, can sometimes vary 
dramatically across different languages. Our goal was to explore what these differences are, and to see 
how they are contextualized in a case of a contested and conflictive topic. The empirical focus was on 
the Republic of Kosovo, a recently formed country in Southeast Europe still seeking full international 
recognition. The study explores the social, cultural and political tensions through following the 
contextualization of this topic in three different Wikipedia communities: Serbian, Croatian and English. 
A constructivist (Charmaz, 1998) and substantive grounded theory of the process was created by 
following a two-step coding process. Three coders were active in different stages of the process. 
Discussions and comparisons of emergent codes, within and between three different communities, were 
conducted regularly. The core concept of our theory was neutrality dispute. It is based on four aspects: 
identities and viewpoints, their input into the process of content editing, relations between the editors, 
and the process of conflict management. The main drivers of conflict and/or consensus, within and across 
languages were different types of group identifications in relation to the topic of Kosovo and Wikipedia 
in general. Wiki software and Wikipedia’s rules help in managing multiple conflicts, although the political 
and cultural contentiousness of the topic existing in the “offline” context was also reproduced in the 


collaborative process. 
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1 Introduction 


Wikipedia is a complex socio-technical system based on specific collaborative software and a set of rules, 
policies and guidelines for producing content. It runs on Media Wiki software which is a modified version of 
the original wiki software developed in 1995 to enable user-friendly, transparent collaboration and online 
database management. The normative side of Wikipedia was developed to ensure a clear set of rules for 
producing reliable encyclopedic content. The core principle is the neutral point of view (NPOV) which 
should present all sides without bias within a given article. With the growth of Wikipedia, specific rules 
and policies have multiplied making content management on bigger language versions an increasingly 
complex affair (Kittur, Suh, Pendelton and Chi, 2007). 

Wikipedia consistently ranks among top ten sites on the web and is offered in almost three hundred 


different languages. However, Wikipedia’s neutral and objectivist knowledge is not always easy to secure, 
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especially in light of the existing language and cultural diversity through which knowledge is differently 
contextualized. Instead of focusing on the technical and normative side of content production, we will 
explore the social and cultural dimensions of collaboration on a conflictive topic across three different 
language versions. The common technical and basic normative context of Wikipedia makes cross-language 
comparisons justified. Our goal is to explore what the differences are, and how these differences are 
contextualized. The empirical focus is on the recently formed Republic of Kosovo in Southeast Europe. As 
a highly contested, but notable, topic it sparked heated regional and international political debates, gained 
significant media coverage, and motivated editors across language versions to contribute with related articles 
on Wikipedia. Its conflictive nature makes opposing points of view apparent and makes an objectivist and 
neutral content production hard to reach. Due to the previous integration of the partially internationally 
recognized Republic as a province within the borders of Serbia, the topic is particularly contested on Serbian 
Wikipedia. 

We employ a qualitative, explorative approach and compare editing dynamics, norms and values 
during the process of producing content about the Republic of Kosovo on Serbian, Croatian and English 
Wikipedia. Gathered data was analyzed using the constructivist grounded theory approach (Charmaz, 1998) 
in an iterative two-step coding process within and between three different interaction contexts and language 
versions of Wikipedia. Due to the sensitivity of the topic, three coders were active in different stages of the 
process to ensure a higher degree of interpretative flexibility. The goal was to create a substantive grounded 
theory (Glaser and Strauss, 1967) explaining the editing process of this specific and problematic topic in 
several selected languages. Our research shows that representing all sides of the story in an article is 
extremely complex, especially in a situation that has historical burdens, emotional tensions, and rapidly 
changes and evolves. The main drivers of conflict and/or consensus within and across languages were 
different types of self- and group identifications in relation to the topic of Kosovo. Wiki software and 
Wikipedia’s rules help in managing the conflicts, but the social, cultural and normative aspects of these 
identities make it complicated. Nonetheless, Wikipedia offers an online mediation forum where anonymous 
contributors with diverging positions directly meet and negotiate under a common technological and 


normative context. 


2 Wikipedia: global, multi-language, knowledge-building environment 


Launched on 15 January 2001, Wikipedia is now Ranked 7 on Alexa’s top sites list.! While the quality 
and accuracy of the articles is sometimes heavily debated due to Wikipedia’s “anyone can edit” approach, 
two intrinsic control mechanisms regulate the editing process, namely the five pillars? and multiple policies 
and guidelines. The five pillars explain what Wikipedia is and what it stands for (a free encyclopedia with 
no firm rules, written from a neutral point of view by the public, for the public by editors who, above all, 
are to treat each other with respect and civility). Policies and guidelines expand on the main pillars; give 
guidelines on the content of articles, conduct, deletion, enforcement of various standards, legal 
considerations and remedies, etc. The core content production policies - verifiability and a neutral point of 
view, with no original research - demand an article to be written in a way in which it represents “fairly, 
proportionately, and... without bias, all of the views that have been published by reliable sources on a 


topic.”? 


1 Retrieved from http://www.alexa.com/topsites 
2 Retrieved from http://en.wikipedia.org/wiki/Wikipedia:Five_pillars 
3 Retrieved from http://en.wikipedia.org/wiki/Wikipedia:Neutral_point_of_view 
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Figure 1: Standard content production model (objectivist) 


By closely adhering to these rules and policies, content production would likely take the form presented in 
the diagram above. There is an editor, or a group of editors, interested in writing about a certain topic. By 
following the basic rules they create, or revise, an article. If there are disputes and conflicts, they are 
discussed and resolved on the talk pages. Consensus is reached and a revised article created. This process 
should, in principle, work the same way in all language versions. 


3 Related research 


Social interaction on Wikipedia and the organization of the work process have seen research focusing on the 
consensual aspects of community identity, and conflictive and disruptive behavior of different kinds. Bryant, 
Forte and Bruckman (2005) showed how long-term and frequent exposure to the editing process changes 
editors’ perceptions of Wikipedia. They start identifying themselves strongly with the project, and start 
thinking of it as a community of co-authors with distinct roles and talents. Pentzold (2011) similarly showed 
that Wikipedia is an “ethos-action community”. In other words, community membership and its boundaries 
are defined by a set of standards regarding Wikipedia's purpose, norms, values, and valid actions. 

Viegas, Wattenberg and Dave (2004) used visualization software to study article revision history 
and have found different patterns of cooperation and conflict. Conflicts were evident as series of reverts in 
article revision history. Other forms of disruptive behavior include, for example, trolling as repetitive, 
intentional, and harmful violations of Wikipedia’s policies (Schahaf and Hara, 2010). However, as Kittur 
et al. (2007) showed, conflict should not only be seen in a negative context, since it also offers positive 
benefits such as resolving disagreements, establishing consensus, clarifying issues and strengthening common 
values. 

Some studies have started probing into Wikipedia's language diversity trying to compare 
communities according to different criteria. Pfeil, Zaphiris and Ang (2006) conducted research on the 
connection between collaborative patterns on Wikipedia and cultural and national backgrounds of editors 
on four language Wikipedias. The study shows that Wikipedia is not a culturally neutral space and that 
differences in behavior can be observed. Hecht and Gergle (2010) performed an analysis of 25 different 
language versions comparing the concepts included in each edition, and comparing the ways in which these 
concepts were described. The authors challenge the so-called “global consensus hypothesis” of objective and 
neutral knowledge and demonstrate significant cultural diversity across different languages on Wikipedia. 
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Yasseri, Spoerri, Graham and Kertész (2013) analyzed similarities and differences between controversial 
topics in 10 different language Wikipedias, simultaneously relating them to geographic locations. The 
authors conclude that different-language-Wikipedias demonstrate divergent social-spatial priorities, 
interests and preferences. 

Quantitative, big data analyses have confirmed that the objective and neutral notion of knowledge 
is hard to reach, given the substantial language diversity of Wikipedia communities. However, while they 
do detect diversity, they cannot completely understand, or explain, the actual processes of managing, 
negotiating and establishing these differences. Culture is closely determined by language and, despite 
intensive globalization and transnational processes, still to an extent by territory. But culture is not simply 
a homogenous territorial and language-dependent concept. It is also dependent on different identities within 
a certain language and territory in a process of their constant discursive formation and negotiation. 


4 Methods 


Grounded theory (Glaser and Strauss, 1967; Strauss and Corbin, 1990; Charmaz, 1998) approach refers to 
a research process well adapted to explorative studies of human interaction in specific contexts. The focus 
is on inductively creating mid-range theories through systematic and iterative process of parallel data 
collection, coding and analysis. The creation of a so-called substantive theory was the goal of our research. 
In other words, the creation of a theory grounded in the empirical situation, and developed without forcing 
it into pre-conceived theoretical ideas (Glaser and Strauss, 1967). A completely open mind, without reliance 
on previous theoretical and methodological ideas or cultural assumptions, advocated by Glaser and Strauss 
(1967) is never entirely possible. Nonetheless, we took great care to ensure our emergent codes and categories 
were not forced into existing theoretical discourses or assumptions. 

The initial fieldwork site was the Serbian Wikipedia and the Kosovo and Metohija article, the only 
existing article on the topic of Kosovo in Serbian language at the start of our research. Since the Republic 
of Kosovo was previously a province within Serbia, the editing process in this language community was 
selected as the first research site. The main data sources throughout the study were Republic of Kosovo 
related article and user talk pages* as a form of “non-reactive data” (Janetzko, 2008). These documents 
provide detailed records of collaborative interaction as well as discussions on problems and issues that arise 
throughout the editing process. Talk pages also provide a transparent record of the work process in a 
longitudinal manner since every statement is signed and time-stamped by the editor. We have also issued 
calls for interview participation to individual users in three different communities, have conducted one 
interview, and engaged in e-mail or talk page discussions on individual talk pages with several editors during 
the first and second phase of the coding process. Although we expected to interview more respondents, even 
short discussions provided valuable insights into community values. For example, given our clearly stated 
Croatian background in the call for research participants and the use of Croatian language and Latin script 
(Croatian is closely related to Serbian as a South Slavic language), entering the Serbian community was 
met with suspicion by some editors. They have openly opposed our research calling it “inappropriate for 
their project” on one occasion, and “science fiction” in another. Despite detailed project description and 
careful observance of ethical standards for online research, there was clear resistance to our entrance to the 
community. These discussions also made us aware of the process of mutual construction of the research 
process between us as researchers and our respondents. It made us realize our own pre-conceptions and 
cultural identities in framing the research process (Charmaz, 1998). The exposure to the underlying 
importance of identities within these communities had set the course for our research and marked the 


emergence of significant categories in the ongoing coding process. 


4 See Annex for a list of documents. 
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We used a two-step coding process (Saldaña, 2009). In the first cycle we used in vivo, process and 
initial coding. In vivo was used to record particularly salient wording, “process” to record the editing 
dynamic, and “initial” for other data that was difficult to label. The gathered codes were then tested, 
purified, compared and organized through second-cycle focused, axial and theoretical coding on new sampled 
documents and new interaction contexts. Three coders were used in the first-cycle coding on Serbian 
Wikipedia to keep an open minded approach and to ensure a high level of interpretative flexibility. After 
each analysis cycle meetings and discussions were organized to exchange the codes and memos, and to 
discuss emerging patterns. Theoretical sampling of documents was performed through following the analytic 
leads such as references to specific talk pages, related topics or the creation of new topically-related articles. 
For example, in the case of the English Wikipedia there were constant references to the Arbitration 
Committee’s decision to impose a “one revert per week” rule to deter disruptive edits. Hence the document 
containing the original Committee decision was sampled to saturate the emergent “WP rules” category as 
part of the conflict management process. This revealed the lack of similar third-party mediation possibilities 
on smaller language Wikipedias. Also, the Serbian community created the Republic of Kosovo article shortly 
after the start of our research making it necessary to sample these documents as well. $ Upon finishing 
coding of the Serbian Wikipedia, we expanded to include the Croatian and English versions.° This provided 
the opportunity to test the existing codes from the Serbian sample in new interaction contexts. Theoretical 
saturation was quickly reported by both coders on Croatian and English data. However, certain procedural 
and value-related differences were evident, coded and categorized into new elements of the emerging theory. 


5 Results 


5.1 Political context of the Republic of Kosovo 


The Republic of Kosovo is a partly internationally recognized country in Southeast Europe, previously the 
Autonomous Province of Kosovo and Metohija within Serbia. The majority of the population consists of 
ethnic Albanians, except in northern municipalities where a Serbian majority population lives. Starting in 
the late 1980s nationalist tensions created a highly volatile situation with Serbia revoking the autonomous 
province status. This caused resistance, riots and an insurgency against Serbian repressive measures. 
Insurgent and massive counter-insurgent military campaigns from Serbia ensued. With international 
mediation bringing no results in calming the situation, a three-month NATO bombing campaign forced 
Serbia to withdraw its military and police presence in 1999. Following a UN Security council resolution, 
Kosovo was placed under the UN Interim administration. The negotiation process between Serbia and 
Kosovo ended without agreement and in 2008 Kosovo declared independence. A European Union-led law- 
and-order mission was given the task of improving the rule of law, while a NATO-led peacekeeping force 
was given the task of providing a secure environment. In late 2008 Serbia challenged the legality of the 
independence declaration before the International Court of Justice. In 2010 the Court released the opinion 
that the declaration did not violate general principles of international law. Nonetheless, Serbia still disputes 
its independence and dialogue between the two countries has brought very limited success. An EU mediated 
agreement was reached in April 2013 that, among other agreements, enabled Serbia to get a date for EU 
accession talks, and for Kosovo to gain more control over the Northern Kosovo municipalities. Despite the 
fact that Kosovo joined various international organizations and gained recognition from a large number of 
countries including the United States, its independence is also disputed by, among others, five EU countries 
(Greece, Cyprus, Slovakia, Romania and Spain) and Russia which has a strong leeway in blocking its UN 
membership due to its veto power in the UN Security Council. 


5 See table 1 for more details on article creation timelines. 
6 The sampling of other languages such as the Albanian (Albanians being a dominant ethnic community in Kosovo) was not possible 
given our unfamiliarity with that language. 
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Figure 2: Kosovo in Southeast Europe’ 


5.2 Multiple contextualization on Wikipedia 


Due to the contested nature of the topic it was contextualized in different ways in different languages. The 
table below displays the article creation timelines and different articles under which Kosovo is discussed. 
The Serbian language Wikipedia contained information about the Republic in the Kosovo and Metohija 
article until recently when a separate Republic of Kosovo article was created. The Croatian language 
contains related information under the Kosovo article, and the English version has two articles (Autonomous 
Province of Kosovo and Metohija and Republic of Kosovo) that were separated from the main Kosovo 
article after the declaration of independence. 


Language version and article creation date 


Article title sr.wikipedia.org  hr.wikipedia.org en.wikipedia.org 
Kosovo 4 8/5/2005* 15/12/20018 
Kosovo and Metohija 6/9/2004* - - 

Autonomous province of Kosovo and - - 21/2/2008* 
Metohija 

Republic of Kosovo 29/4/2013* - 25/7/2009* 


Table 1: Article creation timelines (* signifies the analyzed articles) 


Our study resulted in a total of 439 codes’ with the core concept being neutrality under heavy dispute. 
Neutrality is dependent on four core elements: identities and viewpoints, their implications for the process 
of content editing, consequent relations between editors and types of conflict management. These four 
elements are grounded in 21 categories, and 55 sub-categories. The diagram below displays the relations 
between the core elements of the concept and the relations between those categories. The model is common 
to all three language versions, although variety in the intensity of existing identities and viewpoints can be 
discerned with consequences on the nature of disputes and types of conflict management. We provide 
explanations and examples in the following sections. 


T Retrieved from http: //en.wikipedia.org/wiki/File:Kosovo_in_its_region.svg 

8 Kosovo article on English Wikipedia contains several thousand discussion pages. Due to the explorative character of this study, 
including limited research resources available, we were not able to fully analyze this talk page. 

9 The total number of codes is substantial, however, the majority of them relate to very subtle differences within categories and sub- 


categories. In the planning stage of our research we opted for dense coding in relatively small-sized documents. 
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Figure 3: Conflictive content production model 


5.2.1 Identities and viewpoints 

Identities are explicit values and statements relating to self and group identification of a group of editors 
within Wikipedia in relation to the topic at hand. The dominant identity on all three languages was what 
we called the encyclopedic identity. It is closely related to the existing rules and policies of Wikipedia, but 
does not always address them explicitly during the collaboration process. This identity is evident within a 
group of editors who take the basic Wikipedia's policies, guidelines and goals as a form of their self- 
identification and who direct the editing process, manage relations and engage in interaction with other 
editors accordingly. It is a set of beliefs that Wikipedia should be careful when following current events and 
opinionated media reports; that objectivity is always achievable and highly desirable; that Wikipedia should 
always follow only the most relevant sources; that Wikipedia educates and informs people; and that 
Wikipedia is an international project based on clear rules. As one editor put it, explaining the objectivity 
principle: “It would be ideal if from the edits and comments ... one could not determine the political 
standpoint or nationality of the editor” (Razgovor: Kosovo). 

Language and territorial identity is the second type of identity. It is not entirely coherent or 
homogenous within the analyzed language communities since it relates with different dynamics to the 
encyclopedic identity, and to differences in viewpoints towards Kosovo. For example, on both the Serbian 
and Croatian community, a clear separation between the language and the official state position towards 
Kosovo was explicit. !? In that sense this identity type becomes closely related to the encyclopedic identity. 
However, it can also be used to legitimate the official state position. For example, in the Serbian Wikipedia: 
"This is SERBIAN Wikipedia. We have a right to our own opinion” (Razgovor: Kosovo i Metohija). It can 


10 The Republic of Croatia officially recognized the independence of the Republic of Kosovo. 
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also be used to demarcate the group boundaries from “outsiders” talking in different languages. For example, 
in the Croatian Wikipedia, certain editors were opposed to the Serbian grammar style used by a Serbian 
editor. Or in the English Wikipedia: “Your standard of English is very poor” (Talk: Republic of Kosovo). 

These tensions get amplified by explicitly taking one side and simultaneously excluding the other 
in relation to Kosovo’s independence. They are not typical for one language community either. Nonetheless, 
the strongest pro-Serbian viewpoints can be found on the Serbian Wikipedia. They stem from an emotional 
response to the territorial loss ("painful topic”, “delicate subject” (Razgovor: Kosovo i Metohija)) and the 
perception of the international community as unjust towards Serbia ("Serbs are always to blame” 
(Razgovor: Republika Kosovo)). Simultaneously the independence is not accepted as legitimate and these 
positions sometimes turn into anti-Albanian attitudes and direct ideological and nationalist oppositions. 
They question the origin of the Albanian ethnic community and its religious orientation, and criticize the 
Albanian Wikipedia: "Unchecked, they can write whatever they want” (Razgovor: Kosovo i Metohija). The 
pro-Albanian viewpoints are weak in all three communities. !! In the Croatian community they can be traced 
in connection to historical interpretations and the acceptance of the legitimacy of independence ("Kosovo 
was always inhabited by Albanians” (Razgovor: Kosovo)) or sometimes in its closely-related anti-Serbian 
and nationalist viewpoints. One editor made a sarcastic remark about the other: “You have a predictably 
anti-Serbian attitude and if 3 Martians landed in the Serbian Woodlands and proclaimed their state, you 
would grant them the right to do so” (Razgovor: Kosovo). These views, however, are not dominant but are 
in constant negotiation and struggle with the encyclopedic values and positions to impose certain 
interpretations on both local communities. The English Wikipedia seems to offer a forum for both Pro- 
Serbian and Pro-Albanian viewpoints making it difficult to negotiate a middle path between all of the 
existing identities and viewpoints. 


5.2.2 Editing the article content 


Actions based on previously defined identities and viewpoints lead to interpretations of different aspects of 
content editing and article structuring. This part of the process gets fragmented into a myriad of polarized 
debates. The issues often blend from one into the other. The most salient in all communities is the problem 
of a balanced introductory sentence and paragraph, followed by the organization of infoboxes and the 
remaining topical paragraphs. Article information often relates to detailed fact checking (list of states that 
have recognized the independence of Kosovo, demographic information, GDP, etc.) and using appropriate 
sources to fill in the blanks. Terminology relates to the way in which sentences and information is presented 
and described. Depending on the article context, local communities were focused on clarifying the status of 
the Republic as simply a territory, an autonomous province, or the Republic every time it was mentioned 
within the article. The English community faced a similar issue: ”[t]he article is already crammed full of 
‘partly recognized’ and ‘unilateral’ and so on, at every place in a sentence where somebody can cram in a 
caveat” (Talk: Republic of Kosovo). 

Media and other information sources are rarely commented but can also be used to support or 
dispute certain claims. A significant amount of coordination work between language versions of Wikipedia 
existed, whether looked upon critically or positively. For example, the Serbian community was swinging 
between open criticisms towards a perceived bias in the English articles, towards acceptance of the thematic 
split between the articles: “The English solution is perfect” (Razgovor: Kosovo i Metohija). On the Croatian 
community other wikis were consulted to determine the frequency of use of the term Republic of Kosovo. 


11 It is noteworthy that the Republic of Kosovo’s Ministry of Foreign Affairs, in cooperation with several international organizations, 
organized the Wiki Academy Kosovo in 2013 to ”[iļmprove the quality and quantity of online content on Kosovo to better represent 
Kosovo to the world.” The Academy included monetary prizes for best articles and photos representing Kosovo. Retrieved from 


http://wikiacademykosovo.org/ 
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The English community also sought differences in contextualizing the topic, finding similarities between the 
Serbian and Russian Wikipedia. 

The area of contemporary and historical interpretations of Kosovo-related topics is where ideological 
struggles are most evident. Contemporary interpretations relate to criticisms of international law as being 
“useless” or the declaration of independence as being a “shady area”. The international community is seen 
as using Kosovo as its satellite, denying Serbian constitutional demands for Kosovo. Historical 
interpretations relate to the ethnic settlement of the area by Serbs and Albanians and demographic changes 
of the area through history. Finally, links between Wikipedias (interwiki links), intended to directly link 
the same-topic articles between languages were disruptively edited in the English Wikipedia. Since the main 
Kosovo article was split into the Republic of Kosovo article, the creation of the article was perceived as a 
sort of “wiki-recognition” of the existence of that state. Links to other Wikipedias having similar Republic 
of Kosovo articles were deliberately hidden (Wikipedia: Arbitration/Requests/Enforcement). 


5.2.3 Relations between the editors 


Instances of editors from other Wikipedias doing cross wiki commenting on local communities are mostly 
episodic. However, editing the English Wikipedia is seen as an opportunity for giving greater visibility to 
one’s positions, whether encyclopedic or political. A number of editors with user accounts on local versions 
were active in the editing process on the English Wikipedia. 

Exposure to the complex editing process leads to different relations between editors. In general, 
editors who display encyclopedic identity will seek more cautious relations with other editors, taking into 
account their positions and trying to find a common ground. Caution relates to giving thought to ones 
edits, taking time with reverts, thinking through the thematic split of related articles, being polite towards 
new editors, etc. For example one editor stated: “[t]his is an encyclopedia so we cannot play dumb and 
pretend something [Republic of Kosovo] does not exist” (Vikipedija: Clanci za brisanje). 

Conflicts arise from directly opposing positions between editors, breaking previous rules, agreements 
and neutrality principles. Conflicts and disruptive edits lead to a general lack of dialogue and open labeling 
of other editors through sarcasm, cynicism, open political opposition or malicious personal attacks. For 
example one editor expressed his frustration in the following way: “I’ve had it with fighting windmills, 
getting no support and being insulted...” (Razgovor: Kosovo). This poses the greatest challenge for 
Wikipedia as a project since it may lead to quitting from article editing or the project in general. It was 
often expressed as a feeling of frustration with the lack of consensus, incessant pushing of particular views, 
and repetition of similar non-constructive arguments: “I am at a loss as to how to move this debate beyond 
the “is not!” - “is too!” stage” (Talk: Republic of Kosovo). 


5.2.4 Conflict management 


Conflict management relates to all types of managing neutrality disputes and conflicts. It is performed 
through constant article evaluations or criticisms, adherence to rules and policies or alleviating conflicts by 
performing thematic splits between closely related topics. It is mostly done by administrators during 
neutrality or revert disputes. The measures include calls for discussions on talk pages or the introduction 
of different levels of article protection. 

Caution in approaching the editors and the editing process leads to the positive evaluation of the 
article structure, its composing parts and overall existence: “We cannot ignore the existence of the Republic 
of Kosovo” (Razgovor: Republika Kosovo). Caution also leads to careful adherence to Wikipedia’s rules 
and policies in mediating the conflict and a carefully conducted thematic split. This was heavily debated in 
the English community where a split from the Kosovo article and the creation of Autonomous province of 
Kosovo and Metohija and Republic of Kosovo occurred. In the Serbian community the Republic of Kosovo 
article was split from the original Kosovo and Metohija article. However, it was heavily criticized as being 
a poor copy of the English versions: “Incorrect junk from the English wiki” (Razgovor: Republika Kosovo). 
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Conflictive relations between the editors will generally lead to negative article evaluation and article 
criticism, disapproval of rules and policies, criticism or blocking of the thematic article split, etc. Conflict 
management becomes difficult since the editors do not always perceive a discrepancy between Wikipedia's 
rules and their points of view. For example, an administrator on the English Wikipedia noted: He ”...does 
not seem to realize that there is any Point-of View (POV) problem with his edits” (Wikipedia: 
Arbitration/Requests/Enforcement). A key difference between local communities and the English version 
in conflict management procedures is the existence of the Arbitration Committee in the English community 
which has the authority to impose binding solutions in cases of serious disputes. Due to repeated conflicts 
a number of warnings and bans regarding the Balkans related topics were issued by the Committee in the 
past several years (Wikipedia: Requests for Arbitration/Macedonia). Its enforcement decisions have 
generally lead to greater caution in the editing process although persistent problems can still lead to 
disruptive edits and quitting the Wikipedia project. 


6 Broader theoretical relevance and future research 


The relevance of identity in understanding online behavior is nothing new. In the past two decades it gained 
prominence as one of the key research areas in internet studies. However, surprisingly little attention has 
been given to the issue within Wikipedia research, apart from focusing on the core community identity. Our 
research shows that different community identities (encyclopedic, language, territorial) are not fixed, but 
flexible and negotiable. In other words, identity is relational and includes “boundary” work between 
definitions of “us” and “them” (Lamont, 2006). However, the intensity of this process also depends on the 
topical focus of the collaborative process. As Stryker and Burke (2000: 288) claim, behavior is goal directed 
and changes the situation in order to match the meanings perceived in the situation with meanings held in 
the standard. A mismatch between ones identity and the meanings perceived in the situation will result in 
negative emotion, while a decreasing discrepancy will result in positive emotion. 

Similarly, Réssel and Collins (2006, p. 515) believe that “emotional energy” is the driving force of 
all “interaction rituals” with people seeking to maximize their own level of emotional energy. Past 
experiences get accumulated and direct individuals in seeking, or avoiding, certain interactions. 
Maximization of one’s emotional energy can be done at the expense of the other, which means that different 
exclusionary tactics, pressures and patterns of dominant behavior can be exerted toward groups with an 
identity perceived to be different from one’s own. This is the point where open conflicts and ideological 
clashes occur. As van Dijk states (1998), selected values serve as the basis for group self-identification. 
Dominant groups use the integration of values to legitimate their opposition, disagreement and resistance 
and to emphasize the polarization between “us” and “them” by showing “us” in a positive way, and “them” 
in a negative way. 

The issues of identity work, emotional energy and ideological struggles were particularly salient in 
our research. However, they are by no means exclusive to these selected language versions and communities. 
In future research the presented substantive theory of neutrality disputes should be tested on other cross- 
language and conflictive topics on Wikipedia, including different language communities in the region and 
globally. These could improve our understanding of conflicts, ideologies and the role of cultural identities 
in online contexts and help us understand if, and how, online forums might facilitate and/or manage conflicts 
and clashes. 


7 Conclusion 

Social interaction is always messy and difficult to put into clearly separate categories. This is especially the 
case when highly problematic political issues are being discussed and negotiated in three different, but also 
related, online contexts. However, understanding the background processes of structuring online information 
is vital in contemporary networked societies, especially regarding Wikipedia as one of the most popular 
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global websites. After closely examining the collaborative process in this study certain patterns are 
discernible. We have identified social, cultural and political aspects that drive the editing process and distort 
the objective, neutral content production. The context of a language community plays a crucial role, but 
its identity is not entirely homogenous or coherent. Common to all three language communities was the 
existence of a group of editors with strong encyclopedic identity. However, exclusionary, nationalist 
identities put the objective notion of knowledge construction under strain. Also, communities are not closed 
entities, but relate to, and observe the editing process in other languages. A simple translation of article 
names to other languages might not refer to the entirely same topic. We have seen that the Republic of 
Kosovo was differently contextualized in different language communities and at different points in time. 
The identities and viewpoints shape the relations between editors and lead to either cautionary or conflictive 
editing. Conflict management includes constant article evaluations, article criticisms, article forking and 
adherence to existing rules and policies. Its success will depend on the intensity of included identities and 
viewpoints, and also on the size of the community and the availability of different conflict management 
mechanisms. 

Overall, it seems that language and territory do not produce coherent and homogenous wiki 
communities, and hence do not produce homogenous and coherent knowledge. On the contrary, it appears 
that these Wikis re-produced political and cultural conflicts and diversity already existing in the “offline” 
contexts, adding to them the existence of purely online identities, such as the encyclopedic identity. The 
difference in comparison with the “offline world” is the possibility of direct, repeated confrontation between 
diverging positions in the negotiation process during collaborative creation of encyclopedic articles. In that 
sense online contexts such as Wikipedia provide mediation forums and environments for discussing difficult 
and problematic contemporary issues. Neither consensus nor conflicts are stable behavioral patterns on 
Wikipedia. Even small concessions of shortly lived consensus after long-term conflicts regarding minute 
details of a given topic add-up to the creation of diverging encyclopedic articles in different languages. These 
processes significantly alter the outlook of encyclopedic knowledge as represented in Wikipedia’s articles. 
While related research has already emphasized the language diversity of Wikipedia’s communities it 
provided limited explanation as to how and why these differences occur. A transversal approach to analyzing 
large amounts of quantitative data can distort the results since conflictive situations are prone to quick 
alterations over short periods of time. These dynamics can be better detected by taking note of the historical, 
cultural, political and other contextualizations, and by closely analyzing complex internal negotiations in a 
longitudinal manner. A given language should not be taken as an essential character trait of individual 
Wikipedia communities, nor should it be equated with geographic location or culture as a whole. We hope 
this study will provide further impetus for studying these differences and for shedding new light on the 
ways in which online and offline contexts interact and provide the capacity to change historical processes 
by offering alternative and novel communicative processes and negotiation forums. Whether Wikipedia’s 
neutrality and objectivity principle can always transcend intense clashes and conflicts is yet to be studied 
and confirmed. 
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Abstract 

The aim of this study is to investigate how ALM professionals conceptualise the common role of archives, 
libraries and museums (ALMs) in the contemporary society. There is only a little earlier empirical 
research on the topic. This study is based on a quantitative analysis of the results of a web survey of 131 
ALM professionals. The analysis shows that the views of the respondents epitomise diverging and 
contradictory ideas of the role of the institutions. The findings underline the need to discuss and define 
the future of the ALMs on a profound level of their societal role with a clear emphasis of its theoretical 
underpinnings. The diverse of opinions and number of mostly practice-oriented visions can be helpful in 
shaping and reshaping the role of the institutions. At the same time, it is apparent that they do not have 
the required theoretical depth to function as a common ground for explicating the role of ALMs in the 
contemporary society. 
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1 Introduction 


The past two decades have witnessed an increasing political interest in memory institutions (archives, 
libraries and museums, ALMs) and their role as shapers of the future society (Trant, 2009). At the same 
time, some of the proponents of digital information technologies have heralded the Internet age (Usherwood 
et al., 2005a) as their end. Even if it is probably too hasty to doom the ALMs altogether, also many ALM 
professionals have acknowledged the impact and convergence of technologies and cultural changes such as 
the raise of user orientation (Holmberg et al., 2009; Ridolfo et al., 2010; Srinivasan et al., 2009) and a 
consequent need to change some of the traditional tenets of the institutions. The relative significance of 
physical collections at the libraries has been recognised to diminish (Baker, 2007). Museums have begun to 
develop digital presences and breaking out of their traditionally monumental walls (Marstine, 2006), and 
archives professionals have observed that in the digital age, ’archiving’ has ceased to be a monopoly of 
professional archivists (Featherstone, 2006). A review of the earlier literature shows, however, that much of 
the discussion revolves around the topics of their public function, commerciality and anti-commerciality, 
cooperation, barriers, technology, marketing, trustworthiness and empowerment. In spite of the scale of the 
debate, there is only a little empirical research on how the professionals and the public perceive the future 
prospects of the ALMs. The earlier works consist primarily of opinion pieces, political programmes and 
theoretical literature (e.g. Anderson, 2007; Barry, 2010). The most of the existing empirical research has 
been conducted with the visitors or users of the institutions, not with professionals (e.g. Julien & Genuis, 
2011; Usherwood et al., 2005a). 
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The aim of this study is to address this gap in the earlier research and to study how the future role 
of ALMs is conceptualised by professionals working at the institutions in terms of how they perceive the 
significance of the predominant views expressed in the earlier studies, theoretical and practice oriented 
literature. The theoretical underpinnings of the study are based on the socio-constructivist assumption that 
the future role of the ALM institutions in the society is influenced by how different stakeholder groups 
conceptualise it. The assumption is premised by a common, although often implicit, postulate of futures 
studies that representations have a performative potential (Fuller & Loogma, 2009). Even if stakeholder 
theorising has been justly criticised for assuming that all stakeholders are influencers (Donaldson & Preston, 
1995), unlike non-influential stakeholders such as job applicants, the ALM professionals have plenty of 
opportunities to operationalise their representations of the future as a part of their daily work. Together 
with earlier research on how other stakeholder groups conceptualise the future of ALMs, this study provides 
insight and useful knowledge of the professionals’ point of view for future research on the societal role of 
the ALMs. The perceived relevance of the collaboration of ALMs and the introduction of such umbrella 
concepts as the “memory institution” have captured political, theoretical and professional imagination to a 
degree that it is easy to argue that the notion of convergence warrants a critical discussion of the future 
role of these institutions in a single study (Trant, 2009). The findings of this study provide understanding 
of how ALM professionals conceptualise the role of their institutions and what kinds of assumptions and 
perspectives steer their daily work. The results function also as a baseline for future qualitative research of 
the present and future role of specific ALMs and provide a basis for developing strategic planning at the 


institutions. 


2 Literature review 


2.1 Studies on the perceptions of the present and future 


The present and future role of librarians, archivists and museum professionals is a popular topic in the 
professional debate (e.g. Abram, 2007; Bailey, 2006; Norberg et al., 2009). ALM institutions have also 
captured the imagination of many widely cited theorists (e.g. Ebeling & Giinzel, 2009; Foucault, 2002), 
even if the connotations of these theoretical and often metaphorical conceptions tend to differ from the 
practical reality of the institutions (Ebe, 2009). Another line of theoretical discussion that has had a more 
direct impact on the development of the notions of memory institutions and related terms such ALMs, 
LAMs (libraries, archives and museums, e.g. VanderBerg, 2012) and GLAMs (galleries, libraries, archives 
and museums, e.g. Lim & Liew, 2011) stems from the theorising of the similarities in the societal role of 
archives, libraries and museums. Rayward and Jenkins (2007) articulate a widely shared view that “[t]he 
collections and services of libraries and related agencies, such as museums and archives, are important 
components of social and institutional memory”. On the level of the function of the holdings of the ALMs, 
Buckland’s discussion of the nature of information (Buckland, 1991) and documents (Buckland, 1997), and 
the later revival of documentation movement have highlighted the documentary similarities of museum 
objects, archival records and library materials (Latham, 2012; Lund & Buckland, 2008). According to the 
documentation theory, the holdings of all three types of institutions can be conceptualised as documents. 
Bates assumes a different point of view of the nature of their holdings and discusses them as published 
(libraries), unpublished (archives), embedded (museums) and embodied (museums of natural history) 
information. In spite of her different perspective to the nature of the collections, she argues that ALMs and 
their related scholarly disciplines have a common ground in a shared interest of bringing together “objects 
of social interest for research, learning, and entertainment, and make them available to an audience” and 
suggests that the scholarly disciplines archival, library and museum studies can be described as “collection 
disciplines” (Bates, 2006). 
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In contrast to the lively theoretical discussion, there is, only relatively little empirical research on 
(2009) conducted a Delphi study on the research needs in Swedish librarianship among professional 
librarians in the country. The findings have an obvious indicative value of the anticipated future challenges 
in the field. The vague consensus of opinions underlined the diversity of expressed priorities among the 
informants. 

Librarians tend to be more anxious to emphasise change than the users and non-users of libraries 
(Wagman, 2011; Sinikara, 2007), and the difference of opinions can be a considerable source of tension 
between the priorities of conservative users and more progressive professionals (Sinikara, 2007). 
Comparative studies have highlighted certain rather obvious differences between the different ALM 
professions. Kearns and Rinehart (2011) compared the information responsibilities of archivists and 
librarians. Both groups considered access (to information) to be their first priority. Archivists took more 
responsibility for preserving, processing, collecting and management whereas librarians were more inclined 
to emphasise evaluation and research as significant aspects of librarianship together with the somewhat 
controversial task of teaching (Julien & Genuis, 2011). 

In addition to the relatively few empirical studies of the views of the professionals, there is a small 
corpus of literature on popular perceptions of the ALMs. Usherwood et al. (2005a) conducted a large 
nationwide survey in the UK that was used in the development of the questionnaire for the present study. 
The researchers concluded that the public perceives ALMs as relevant repositories of public knowledge. The 
institutions are considered to be relevant and trusted even if they are not used by everyone all the time. 
Evjen and Audunson (2009) found in a study of the Norwegian users and non-users of libraries that the 
traditional public library values were firmly established but at the same time, in general, the informants 


were open for change and new services. 


2.2 Professional perspectives 


In comparison to the relative small number of empirical studies on the anticipated future role of the ALM 
institutions, there is a large corpus of professional and theoretical literature focusing on the current 
strengths, and expected and endorsed future priorities and relevance of the institutions. Many ALM related 
authors tend to emphasise the continuing value of the institutions and their fundamental principles (e.g. 
Rosa et al., 2011; Duranti, 2010; Gilliland-Swetland, 2000), but only a few are inclined to see their future 
without any major discontinuities. The emphasis of enduring values tends to be related to a perception that 
the principal challenge of the institutions is to market their existing services and competences in new 
operational contexts (Duranti, 2010; Gilliland-Swetland, 2000). For instance, the major proponents of the 
recently popular notion of Library 2.0, Casey and Savastinuk (2007) perceive their primary task to be to 
get more people into the libraries. Similar priorities are dominating in a large part of the marketing and 
outreach-oriented ALM literature (e.g., Ambrose & Paine, 2006; Cerquetti, 2010; Nesta & Mi, 2011; Singh, 
2009; Smith, 2003). 

In contrast to the preservationist tendencies of many authors, others have been eager to emphasise 
discontinuities. Phenomena like the Library 2.0 (Holmberg et al., 2009), participatory archives (Huvila, 
2008) and participatory librarianship (Lankes et al., 2007) have called attention to the inevitable change. 
Calls for developing new research agendas and infrastructures for archival science (e.g. Gilliland & 
Mckemmish, 2004; McLeod, 2008), and for the reappraisal of the role of museums in the society (Genoways, 
2006a) represent explicit attempts to enthral the future. The Institute of Museum and Library Services 
(IMLS) published in 2009 a discussion guide, which describes a series of discussions between library and 
museum professionals on the future prospects of the institutions (Pastore, 2009). The discussed themes 
included the changing role of museums and libraries, shifts in the power and authority, the notion of ’third 
place’, technology and policies, changing practices of learning and information use, collaboration of libraries 
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and museums, sustainability, evaluation and the future employees of the institutions. The list is not 
exhaustive, but represents some of the principal gravitations of the professional discussion in the literature 
(e.g. Kelly et al., 2009). 

The discussion on the changing role of ALMs tends to frame the institutions from two perspectives. 
The first is empathetically utilitarian. ALM institutions are perceived to share a mission of preserving and 
providing access to knowledge, they support learning and promote identity and understanding (e.g. Bowitz 
& Ibenholt, 2009; Cerquetti, 2010; Gilliland-Swetland, 2000) either explicitly or implicitly. They are also 
suggested to play a role as an economic resource and as a provider of direct and indirect quantifiable return 
of investment (Wavell et al., 2002). The institutions are seen as a societal resource and have a role as 
informal educational institutions that contribute to the success and prosperity of societies (Dempsey, 2000; 
Manzuch, 2009; Torstensson, 2002). 

The second perspective puts emphasis on abstract societal and cultural values and rights. In parallel 
to a broader cultural political debate, ALM professionals and academics have discussed the civic role of the 
ALM institutions in the light of classical and contemporary social theory (e.g. in Costantino, 2012; 
Genoways, 2006a; Granstrém, 2002; Hickerson, 2001; Jimerson, 2004; Leckie et al., 2010). Access to the 
assets hosted and represented by the institutions is perceived as a new civic right (Dempsey, 2000) 
independent of the cultural background of the citizens. An overall line of argument of the discussion is that 
the traditional colloquial ideas of the role of the ALMs are out-dated in the context of the currently 
dominant paradigm of socially oriented archives, libraries and museums. The critique has stemmed both 
from the continental critical theory (e.g. Henning, 2006; Leckie et al., 2010) and postmodernism (e.g. Cook, 
2001). Researchers have been keen to expose traditional hierarchies, sub-textual ideologies and the 
predominantly Western cultural underpinnings of the ALM institutions that do not make sense in all 
cultural contexts, globally (Duncker, 2002) or locally (Shilton & Srinivasan, 2008). Subsequently, the critics 
have urged the necessity to redefine the role of the ALMs from the perspectives of broader inclusiveness 
and global representativeness (e.g. McKemmish et al., 2005). 

Even if the critique of the established ideas of ALM institutions is often directed against the 
traditional credo within the ALMs, the idea of an inevitable societal and cultural change can be linked to 
broader ideological project. Sahlén (2005) describes this adaptation to the subtext of the dominating 
contemporary ideologies as “modernisation”. The earlier idea of ALMs was based on the assumptions of 
stability (Martinon, 2006), existence of an intrinsic value of the institutions, positivist ideas of their 
impartiality and reliance on established unarticulated hierarchies of control and valorisation (Cook, 1997; 
Henning, 2006). 

In spite of the frequent emphases of the similarities of the ALMs, there are also fundamental 
differences in how the three types of institutions and their role is conceptualised in the literature. Museums 
underline the role of experiences, authorship and exploration (Genoways, 2006a; Gilliland-Swetland, 2000). 
Library literature and practice have traditionally focused on access, community building and lately more 
and more on learning and information literacy (e.g. Gilliland-Swetland, 2000; O'Connor, 2009). In archival 
field, there are several competing perspectives that conceptualise archives as information institutions (e.g. 
Buckland, 1991; Gilliland-Swetland, 2000) or cultural heritage institutions (Manzuch, 2009), or that 
emphasise their distinctiveness by highlighting the non-informational and non-cultural nature of archival 
records as pieces of authentic evidence (e.g. Duranti, 1999). Also, even if some authors (e.g. Bates, 2006) 
perceive all ALMs as collection institutions, the collection focus tends to be stronger in museums and 
archives (e.g. Gilliland-Swetland, 2000), whereas libraries are portrayed more frequently as information 
providers (Hill, 1999, 106-107, 191, 204). 
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3 Material and methods 


The aim of this study is to map the future role of ALMs as it is conceptualised by ALM professionals. In 
order to control the effect of contextual variables the population was limited to professionals working in 
archives, libraries and museums in Sweden. The data were collected using a survey questionnaire. The 
survey was conducted online using Lime Survey 1.90+ open source survey software. The data were analysed 
using correlation (rcorr and cor.test) analysis and descriptive statistics (psych.describe) on R 2.12.2. The 
perceptions of ALM professionals were measured using a set of 22 statements presented on a 10-point Likert 
scale about the future role and priorities of ALM institutions. The questionnaire was developed by four 
researchers on the basis of an in-depth review of the earlier literature on the anticipated future of archives, 
libraries and museums (principal sources Gilliland-Swetland, 2000; Merritt, 2008; Pastore, 2009; Usherwood 
et al., 2005b). The statements are listed in the Table 1. The rationale of the construction of the survey 
instrument was to select issues that have been identified as significant in the literature (including Gilliland- 
Swetland, 2000; Merritt, 2008; Pastore, 2009) and in the earlier studies of the attitudes of non-professionals 
(Usherwood et al., 2005b) and to test whether and to what extent the professionals consider that the same 
issues will have a significant influence in the shaping of the future role of their institutions. According to 
the theoretical premises of the study, it was assumed that if professionals put a lot of emphasis on, for 
instance, the role of technology, this idea is likely to have an impact on their actions and consequently, on 
the future of the ALMs. 

The respondents were recruited by posting invitations to major ALM related mailing lists and social 
media sites in Sweden including ark-forum, arkivet.ning.com (archives), biblist and biblfeed.ning.com 
(libraries), nck-list (museum pedacogy) and sverigesmuseer.se (museums), and promoted further by using 
the personal contacts of the author and his colleagues, and social networking services including Twitter, 
Linkedin, Facebook and personal blogs. 

The convenience sample consists of 131 Swedish ALM professionals with 80/131 (61%) females and 
44/131 (34%) males (7/131, 5% with no answer). 87% (114/131) of the respondents were 31-64 years old 
and 35% (46/131) between 51 and 64 years. 55% or 72/131 had an undergraduate degree and 38% (50/131) 
a master’s degree. Only three (2%) had acquired a doctoral degree and one had no formal education. 42% 
(54/131) identified themselves primarily as librarians or library professionals, 8% (10/131) as information 
specialists, 29% (38/131) as archivists and 11% (14/131) as museum professionals. The 14 (11%) respondents 
who did not identify themselves in the four groups worked in archives, libraries and museums related 
governmental, administration, education, development and consulting duties. 16% (21/131) of the 
respondents were employed by museums and heritage centres, 29% (38/131) by archive institutions, 42% 
(56/131) by libraries and 12% (16/131) by other institutions. 

According to the official statistics, in 2009, 56% of the Swedish museum professionals had an 
undergraduate degree or higher education. Males performed 47% of the person-years. The total amount of 
person-years was 4199 (Statens kulturråd, 2010b). The same year, Swedish libraries employed 8528 
individuals working for 7160 person-years (Statens kulturråd, 2010a). In 2010, 55% of the staff in public 
libraries, 70% in research libraries had a library education. 83% of the staff of public libraries were women. 
In research libraries the percentage of men was slightly higher, 27% (Kungliga biblioteket, 2011). In 
summary, there is an unknown bias in the sample that has to be taken into account when interpreting the 
results even if the variety and distribution of the respondents may be seen as satisfactory for the purposes 
of this study. 


4 Analysis 


m skew ] 
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Statements (1 = Strongly disagree, 10 = Strongly agree) 


Museums, libraries and archives play an T 

important role in shaping civic and urban 83 15 0 0.75 0.41 wl 

identity 9 

Museums. libraries and archives should increase 8 ( 

their role in shaping civic and urban identity. 2 .01 0 1.06 28 .1 
8 

In the twenty-first century, museums, libraries 8 

and archives are public services rather than Jl 88 0 1.07 0.4 .1 

urban commodities. E 

Museums, libraries and archives need to 8 i 

reassert their public function. .62 .92 5 0 1.66 89 wl 
7 

The fact that ALMs have been perceived to be 5 

important in the past is enough to justify why 22 04 0 13 1.25 22, 

they are needed in the future. 7 

Expectations of leisure time, pressure to 6 

increase productivity at work, and demands of 81 59 5 0 0.77 0.34 2 

family have a negative effect on information 3 

seeking and cultural participation (i.e. there is 

not enough time to seek enough information 

and participate in cultural activities)? 

Museums, libraries and archives should adjust 6 

to these lifestyle demands? 75 .35 0 0.66 0.21 2 
1 

Apathy in information seeking and 8 

participation in cultural activities (people do .29 .06 0 1.55 65 wl 

not care to seek information or culture) is 9 

dangerous for society? 

Apathy in information seeking and 7 

participation in cultural activities causes 33 wal 0 0.91 0.14 2 

apathy in voting behaviour? 5 

The proper business of archives, libraries and 5 

museums is with the serious user .76 58 0 0.3 0.8 2 
8 

Archives, libraries and museums should seek to 6 

counter commercialism .09 .66 0 0.29 0.79 2 
4 

Commercial activity supports the core 4 

activities of ALMs 39 5 0 37 0.81 a2 

a 

By collecting and presenting popular culture, T ( 

archives, libraries and museums can provide .3 43 0 0.9 19 2 

complementary non-commercial perspectives to 2 

that type of culture 

Archives, libraries, and museums lose their 3 ( 

special status and identity by embracing 37 67 0 „1 .22 :2 

popular culture 4A 

Archives, libraries, and museums can help 8 

people to develop a critical capacity and a 72 .69 0 0 1.28 06 .1 

sense of discrimination 5 
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Archives, libraries, and museums should 7 
provide services which prioritise high .O7 57 0 0.48 0.77 2 
intellectual standards and, at the same time, 4 


promote equity and social inclusion? 


A specific emphasis of the following aspects of museums, libraries and archives is vital for the success of ALM 
institutions in the future (1 = Strongly disagree, 10 = Strongly agree)? 


Collections 8 ( 
22 .06 0 1.17 84 “i 
9 
Learned professional skills (in formal) 8 
education .02 74 0 0.72 0.05 pal 
6 
Personal qualities 8 ( 
.06 TT 0 0.88 42 s1 
6 
Conversation skills 7 ( 
97 85 0 1.02 TA l 
7 
Personal knowledge creation 8 : 
.07 .99 0 1.39 wal 
8 
Production of materials 7 
5 0 0.54 0.5 wl 
8 
Co-creation together with users 8 
.08 32 0 1.47 65 2 
1 
Archives, libraries and museums would benefit of co-operation on following issues (1 = Strongly disagree, 10= 
Strongly agree) 
Collection management 7 
21 87 0 0.8 0.53 2 
6 
Information and reference service 8 
18 32 0 1.44 53 2 
1 
Outreach 7 ( 
.43 61 0 1 .09 2 
4 
Knowledge organisation 8 
23 16 0 1.41 71 2 
Pedagogy 8 
.13 .23 0 1.4 55 2 
Internet search portals (like Europeana) 8 ‘ 
.66 86 0 1.86 76 ale 
7 
International cooperation 8 ( 
.11 .01 0 1.11 93 Jd 
Ba 
Do you think that fewer people will work in 5 
archives, libraries and museums in the future? .44 -T6 0 0.11 1.04 2 
5 
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What do you see as the potential barriers to using ALMs? (1=Completely disagree, 10=Completely agree) 


Lack of convenience 5 
15 76 1 1.17 2 
6 
People get what they want easier from other 5 
places .52 T3 0.16 1 2 
5 
Lack of sense of community-ownership (ALMs ze 
are not for me) .06 44 0.83 0.15 a2 
aes 
Lack of knowledge of what and how to find 8 
things in ALMs 62 53 1.72 92 wl 
4 
ALMs are not for all 5 
39 13 .01 1.37 2 
9 
The following factors are important in getting more users to ALMs (1 = Strongly disagree, 10 = Strongly agree) 
Busy lifestyle 6 
9 .49 0.7 0.27 2 
3 
Museums, libraries and archives are important 6 
even if they are not used 75 76 0.54 0.75 2 
5 
New technology 8 
5 .66 0.96 29 wl 
5 
User education and training 7 
31 18 0.55 0.36 2 
Personalised coaching and service of individual 8 
users 18 98 1.25 Ll .1 
8 
Closer cooperation with schools 8 
.28 .89 1.13 .89 A 
an 
Marketing 8 
.49 81 1,36 93 “i 
6 
ALMs should commit themselves more to T 
societal issues .78 .31 1.02 Al 2 
1 
Better service quality 8 
LL .18 1.36 .35 .2 
Cooperation with commercial actors 5 
.86 -78 0.24 0.96 2 
5 


The use of museums, libraries and archives is strongly dependent on (1 = Strongly disagree, 10 = Strongly agree) 


Gender 


.99 


4 
89 


1.24 
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Age 6 ( 
.69 35 0 0.84 15 2 
1 
Level of education T. ( 
04 09 0 0.96 56 .1 
9 
Social class T ( 
13 35 0 0.92 19 2 
1 
Income 5 
.79 .56 0 0.32 0.82 2 
3 
Ethnicity 5 
.99 59 0 0,49 0.79 2 
3 


Table 1: Descriptive statistics of the statements on a 10-point Likert scale. 


The summary of the descriptive statistics of the sample is presented in Table 1. The fact that ALMs have 
been perceived to be important in the past was not considered to be enough to justify that they would be 
needed in the future (mean 5.22, sd 3.04, median 5). The difference was significant with Wilcoxon W=4088, 
sig. 1.123e-11 and up. In spite of the general tendency to reject the significance of historical justification, 
not all respondents agreed with the idea (max 10). There was also a moderate significant correlation between 
the intrinsic value and historical relevance of the institutions (Table 2). Interestingly enough, only a few 
(as discussed below) significant statistical correlations could be observed with the statements and 
background variables of age, gender, education, employer (type of ALM institution) or the self-identification 
of the informants as librarians, information specialists, archivists or museum professionals. 

When it comes to means of achieving the goals, the respondents valued almost equally the different 

assets of the ALM institutions. The production of materials scored lower than the rest of the factors. The 
difference was significant (W = 8437, p-value = 0.04972), but low. Cooperation was valued in general 
(median 9 for all cooperation related questions) although somewhat less in the context of collection 
management and outreach (median 8). The differences were insignificant. 
The respondents considered that the lack of knowledge about what and how to find things in ALMs is the 
most significant reason (W = 4621.5, p-value = 2.011e-08) of non-use while the lack of convenience was 
ranked lowest, although not significantly below the argument that ALMs are not for all. The ranking order 
of the means gives an impression that the principal issues of non-use were perceived to relate to the lack of 
knowledge and commitment from the side of the users instead of being dependent on the services and 
offerings of the ALM institutions. 

According to the respondents, the best methods of attracting more users to the ALMs are new 
technology (mean 8,5, sd 1,66, median 9) and marketing (mean 8,49, sd 1,81, median 9). In contrast, the 
respondents did not believe in the positive effects of cooperating with commercial actors (mean 5,86, sd 
2,78, median 6, significant difference to other methods with W = 9837, p-value = 4.014e-05). These doubts 
were confirmed by the low scores for the statement that commercial activity supports the core activities of 
the ALMs (mean 4,39, sd 2,5, median 4). A closer look at the standard deviations and minimum and 
maximum values shows, however, the controversiality of the topic. 

The respondents believed that the use of the ALMs depends on the level of education, social class 
and age more than on ethnicity, income or gender (W = 6600, p-value = 0.04081). On the basis of a raw 
ranking of the highest median scores of prominent issues (9.5-10, see Table 1), it seems that the professionals 


53 


iConference 2014 Isto Huvila 


considered that ALMs could help to develop critical capacity and a sense of discrimination, but that they 
need to reassert their public function. 

There were some differences between different groups within the sample. Male respondents were 
less inclined (mean 7.02 vs. 8.23, p=0.003233) to believe that ALMs play an important role in the shaping 
of the civic role and identity, that the role should increase (7.37 vs. 8.66, p=0.0004916), and the ALMs need 
to reassert their public function (7.74 vs. 9.09, p=0.0002082). Females were more inclined to believe in the 
significance of conversation skills (8.24 vs. 7.46, 0.03843), personal knowledge creation (8.51 vs. 7.25, 
0.001280) and the production of materials (7.78 vs. 6.98, 0.04335). They also valued higher the significance 
of outreach (7.93 vs. 6.79, p=0.02081) and pedagogy (8.47 vs. 7.58, p=0.03919) than males. Further, the 
female respondents believed that user education and training (7.74 vs. 6.62, p=0.007245), personalised 
coaching (8.57 vs. 7.57, p=0.008506) and cooperation with schools (8.55 vs. 7.81, p=0.04130) is important. 
Males were less inclined to believe that the lack of community ownership is an important contributing factor 
to the non-use of the ALMs (6.40 vs. 7.47, p=0.02238). Female respondents trusted more on public libraries 
as a source of information than males (8.05 vs. 7.42, p=0.04541). Females were also more positive towards 
the inclusion of popular culture in the ALMs (7.66 vs. 6.57, p=0.02278) and the prioritisation of high 
intellectual standards (7.65 vs. 6.08, p=0.002348). 

The respondents with higher education were less inclined to believe in the significance of material 
production as a success factor (p=0.007596), less inclined to believe in the significance of the collaboration 
on Internet search portals (p=0.02744) and the power of marketing (p=0.02519). The analysis gave also 
indicative evidence that the respondents who worked in museums were more positive toward the significance 
of outreach and co-creation together with users, cooperating with schools and in seeing a link between 
cultural and societal engagement, whereas library employees tended to be least positive in the same issues. 
Museum professionals were also more positive towards the capability of the ALMs to develop critical 
capacity and a sense of discrimination, and more inclined to consider that age and ethnicity are significant 
factors that determine the use of the ALMs than their colleagues in libraries or archives. 

Correlation analysis of (rcorr and spearman.test in pspearman) the responses revealed generally 
relatively low correlations. The highest correlation coefficients are summarised in Table 2. 


Statements Correlating statements Spearman’s p 
rho 
Museums, libraries and archives Apathy in information seeking and 
play an important role in shaping participation in cultural activities 0.41 <0.001 
civic and urban identity. causes apathy in voting 
behaviour? 
Archives, libraries, and museums 
should provide services, which 
prioritise high tele 0.43 <0.001 
standards and, at the same time, 
promote equity and social 
inclusion? 
The fact that ALMs have been Museums, libraries and archives 
perceived to be important in the are important even if they are not 0.41 <0.001 
past is enough to justify why they used 
are needed in the future. 
Cooperation with commercial Commercial activity supports the 
actors as an important factor in core activities of ALMs 0.62 <0.001 


getting more users to ALMs 
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Archives, libraries and museums 


Closer cooperation with schools as 


would benefit of co-operation on an important factor in getting 0.52 00.001 
outreach more users to ALMs 
“ALMs should commit themselves 
more to societal issues” as an 
i i , 0.45 <0.001 
important factor in getting more 
users to ALMs 
Archives, libraries and museums Closer cooperation with schools as 
would benefit of co-operation on an important factor in getting 0.50 <0.001 
pedagogy more users to ALMs 
Archives, libraries and museums New technology as an important 
would benefit of co-operation on factor in getting more users to 0.46 <0.001 
Internet search portals ALMs 
User education and training as 
important factors in getting more 0.43 <0.001 
users to ALMs 
Lack of knowledge of what and Archives, libraries, and museums 
how to find things in ALMs as a can help people to develop a 0.44 <0.001 
potential barrier to using an ALM critical capacity and a sense of 
discrimination 
New technology as an important Better service quality as an 
factor in getting more users to important factor in getting more 0.41 <0.001 


ALMs users to ALMs 


Table 2: Summary of the correlation analysis (rcorr and spearman.test in pspearman) of the statements. 


The clustering of the statements was tested using factor analysis (factanal, varimax-rotation) with three to 
six factors. The analyses revealed no significant correlation patterns between the groups and individual 
variables. A combined analysis of the correlations and descriptive statistics provides, however, indicative 
evidence of the relevance of certain thematic areas of interest that pertain to the future role and strategies 
of the institutions. 


5 Discussion 


The general trend of the responses was rather unsurprisingly that the ALMs have a significant societal role 
to play even in the future. The major finding of this study is, however, that the respondents lacked consensus 
about the essence of the future role of the ALMs and especially about the means to maintain, increase and 
reassert it. There were differences between the opinions of the respondents with a museum background 
especially in comparison to library professionals (archivists were mostly positioned in the middle group 
between the extremes), but in general, the differences between the groups were small similarly to the 
influence of background factors. The findings are largely consistent with the earlier literature (e.g. Gilliland- 
Swetland, 2000; Rosa et al., 2011; Sundqvist, 2007), including the results of the study of the attitudes of 
the British general population by Usherwood et al. (2005a) used as a premiss for formulating the survey 
instrument of the present study. In contrast to the (British) public, the (Swedish) professionals tended to 
be more sceptical about the intrinsic value of the ALMs. 

The study has some evident limitations. The similarity of the attitudes of the respondents and the 
opinions presented in the literature is not surprising considering the fact that the respondents are 
undoubtedly aware of the general lines of the public and professional debate. The high representation of 
library professionals (especially in contrast to museum professionals) in the sample is a likely source of bias 
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that has to be taken into account when interpreting the results. An additional source of bias is the nature 
of the present study and its focus on the investigation of a selection of dominant themes found in an 
extensive review of the earlier literature that does not account for the possible significance of other factors. 
As a whole, the studied material represents a convenience sample, but it is assumed that the dropout is 
likely to be higher among those professionals with a less explicit vision or agenda for the future, and lower 
among those who are more likely to make a major difference for the future of the institutions. Therefore, it 
is plausible to suggest that the material can provide a useful insight into the major themes of how the ALM 
professionals conceptualise the future of their institutions. There is an unknown bias in the material that 
makes it impossible to generalise the results as is and special care has to be taken when discussing the 
conclusions of the study outside its empirical context. 

The analysis shows that the respondents had a discernible tendency to externalise (Theme 1) the 
principal challenges that face their institutions, and to see ALMs as intrinsically stable establishments. The 
professionals esteem and trust their employers as sources of information, consider that the ALMs are 
important and have a prominent role in the future society. Instead of seeing any major shortcomings in 
their offerings, the respondents considered that a major challenge of the ALMs is an insufficient engagement 
of the professionals in the interaction with the public (e.g. Casey & Savastinuk, 2007; Genoways, 2006b; 
Gilliland-Swetland, 2000). The lack of convenience and ownership were considered to be significantly less 
important factors. The respondents did not seem to share an opinion that the challenges are alarming. The 
lack of knowledge about the existing services among the general public shared opinions. The preference of 
perceiving marketing as a meaningful method of getting more users to the ALMs, and the perceived need 
to reassert the public function of the ALMs scored high but divided opinions (Table 1). In spite of the 
presence of some critical voices, marketing (e.g. Ambrose & Paine (2006, pp. 23-36); Cerquetti, 2010; Mi & 
Nesta, 2006; Smith, 2003) and the urges to focus on attracting the general public to use the existing services 
(e.g. Casey & Savastinuk, 2007) are prevalent in the professional ALM literature. 

Many of the respondents seemed to be empathetic about the significance of technology and were 
inclined to see a link between service quality and the use of technology (Theme 2). The correlation of 
statements “New technology is an important factor in getting more users to the ALMs” and ”Better service 
quality is an important factor in getting more users to the ALMs” give indication of a belief of the 
interdependence of technology adoption and good service. New technology was seen as an important factor 
of attracting more users. The beneficiality of the cooperation in Internet search portals scored also high 
even if it divided opinions somewhat more than technology adoption. A propensity to perceive the role of 
technology as a decisive factor has been documented in the earlier studies in all individual ALM fields (e.g. 
Carrozzino & Bergamasco, 2010; Casey & Savastinuk, 2007; Flinn, 2010; Jimerson, 2004; Kelly et al., 2009; 
Srinivasan et al., 2009). The tendency is especially typical in the ALM literature (e.g. Nesta & Mi, 2011; 
Pastore, 2009). Earlier comparisons of the priorities of the librarians and library users have showed in 
several instances that the professionals tend to place a significantly greater emphasis on the necessity and 
beneficiality of the new technology whereas library users tend to emphasise books as the major asset of 
libraries (Rosa et al., 2011; Sinikara, 2007; Wagman, 2011). 

The popularity of the idea of perceiving the ALMs as a public good (Theme 3) is supported by the 
high levels of agreement with the statements about the societal role of the ALMs and the correlation of the 
statement “Museums, libraries and archives play an important role in shaping civic and urban identity” 
with the statements “Apathy in information seeking and participation in cultural activities causes apathy 
in voting behaviour?” and ”Archives, libraries, and museums should provide services which prioritise high 
intellectual standards and, at the same time, promote equity and social inclusion”. Also the correlation of 
claim that the ALMs play a role in developing critical capacity and that the principal barriers of using 
ALMs are related to the lack of knowledge, suggests of a view that the ALMs should take a more active 
societal role. In contrast to the earlier collection centric paradigm of the ALM institutions, the respondents 
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agreed with the opinions expressed in the literature about the need to operationalise the relevance of the 
ALMs in accordance with the expectations of the contemporary society (e.g. Dempsey, 2000; Genoways, 
2006b; Hickerson, 2001) with a particular emphasis of their relevance in the local societies and in providing 
services for the general public in order to prevent social exclusion (e.g. Brown & Davis-Brown, 1998; 
Gilliland-Swetland, 2000; McKemmish et al., 2005; Torstensson, 2002; Usherwood et al., 2005a). The idea 
of the ALMs as a public good is further emphasised by the preference of the majority of the respondents to 
keep the ALMs separate from any commercialist tendencies. The idea of the beneficiality of commercial 
cooperation scored poorly, and even the statement about the complementary role of the ALMs in the context 
of the popular culture scored lower than several other related statements. At the same time, however, a 
minority of the respondents were strongly in favour of cooperating with commercial actors and inclined to 
appreciate the emerging benefits of such collaborations (Correlation in Table 2). The potential conflict of 
commercial ideologies, and the predominantly non-commercial image of the ALMs have been documented 
in the literature. The benefits of commercial cooperation and commercialist tenets have been acknowledged 
in the literature (e.g. Evjen & Audunson, 2009; Griffin, 2008). Such tendencies are apparent, even if 
somewhat implicitly, in the ALM related outreach, marketing and management literature (e.g. Ambrose & 
Paine, 2006, 23-36; Casey & Savastinuk, 2007; Cerquetti, 2010; Galani & Chalmers, 2010; Mi & Nesta, 
2006; Smith, 2003;). A possible reason for the emphasis of this particular question in the results can be that 
the present survey was run at the time when the Swedish library community was engaged in debating the 
decision of the municipal council of the municipality of Nacka to submit a request for tender for its public 
library services (Rennemark, 2011). Related proposals were discussed at the time also elsewhere, most 
notably in Britain (Downey et al., 2010; Woolley, 2011). 

According to the distribution of the scores on the questions relating to the significance of the 
intrinsic value of the ALMs (Theme 4), it seems that a part of the respondents were relatively consistent 
about their perceptions of an inherent relevance of the ALMs. Similarly conservative tendencies are 
discernible also in the literature. A part of the users of the ALMs have been reported to consider the 
historical judgment and an intrinsic value of the existence of the ALM institutions as a significant reason 
for their continuing relevance (Usherwood et al., 2005a). A part of the idea can be traced back to nostalgia, 
but at the same time, it may be taken as an indication of the persistence of certain ALM specific values. In 
the archival literature, Gilliland (2000) and Duranti (1999) have emphasised the significance of the enduring 
values of archival theory and practice. In the context of museum education, Spock (2006) has discussed the 
continuing relevance of the fundamental notions of curiosity and collecting. In the library literature, the 
results of many user studies have reminded reformists of the existence of a large group of faithful library 
users who are in favour of highly traditional library services (Rosa et al., 2011; Wagman, 2011) even if the 
conservative voices tend to be in the minority in the public library debate, which is often dominated by 
reformist ideals. 

The correlation of the preference for outreach, the increased cooperation with schools and the 
engagement in societal issues seems to indicate that some of the respondents see an active outreach (Theme 
5) to the schools and the society as a relevant strategy for their institutions. The idea of a closer engagement 
with users is not absent from the literature. For instance, the 2.0 phenomenon (e.g. Casey & Savastinuk, 
2007; Flinn, 2010) and the general suggestions of the significance of user orientation and better service 
encounter (Singh, 2009) have emphasised the need to be more active in engaging both existing and potential 
users. Eryaman (2010) discusses a more radical approach to the active engagement with the users on the 
basis of the concept of “border pedagogy” of Giroux. The central underpinning of the urges to put more 
focus on outreach activities seems to be a determination to take the initiative and to actively reassert the 
function of the ALM institutions in the contemporary society. The point of view is in a direct contrast with 
the traditional ideals of neutrality, objectivity and impartiality of the ALM professionals (e.g. Cook, 2001; 
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Hooper-Greenhill, 2007; Stover, 2004). Therefore it is not surprising that the idea of an active engagement 
is perceived as controversial and is not shared by all respondents (cf. the spread of the responses). 

The generally high scores in pedagogy related statements and the positive correlation of “Archives, 
libraries and museums would benefit of co-operation on pedagogy” and ”Closer cooperation with schools as 
an important factor in getting more users to ALMs” can be interpreted to suggest of a preference to an 
additional approach of interacting with the public, that of pedagogy (Theme 6). On the basis of the 
correlation, it is conceivable that a group of respondents frame the mission of the ALM institutions in 
pedagogical terms. The data is consistent with the literature in that a part of the ALM professionals perceive 
that a closer cooperation with schools is a useful approach for attracting more visitors to the ALMs and for 
positively engaging the institutions with the society. The two paradoxes of the approach are that the ALM 
professionals often lack an in-depth pedagogical education (e.g. Höij, 2005), and that their pedagogical role 
is not necessarily acknowledged by educators (Still, 1998). The fact that the respondents were not entirely 
unanimous about their pedagogical role shows the controversiality of the standpoint that has documented 
also in the literature (Julien & Genuis, 2011). 

Finally, the respondents expressed positive views of the significance of training (Theme 7). The 
correlation of “Archives, libraries and museums would benefit of co-operation on Internet search portals” 
with "New technology as an important factor in getting more users to ALMs” and ”User education and 
training as important factors in getting more users to ALMs” may be interpreted to be related to a tools 
oriented view that puts emphasis on an assumed uncontroversial utility of the institutions and the 
consequent need to training their users. The role of the professionals is seen in terms of mentorship and 
facilitation rather than as direct expertise. The ALMs function primarily as resources and tools. The point 
of view is common in policy documents (e.g. European Commission, 2006; DB2, 2009), but has gained 
popularity also in the professional library literature (Stover, 2004; Harris, 2009, pp. 174-176), and in 
similarly instrumental terms, in archives (e.g. Alain & Foggett, 2007) and museums (Galani & Chalmers, 
2010), in the form of a discourse of empowerment. 

A closer look at the seven themes elaborate the general picture of the lack of consensus about the 
essence of the future role of the ALMs and especially about the means to maintain, increase and reassert it. 
It would be tempting to suggest that the diversity stems from the differences between the opinions of 
archives, library and museum professional, but the findings show that, in spite of some institution specific 
variation, the themes are common in the entire sample of respondents. Similarly, because of the analysed 
material, it would be intriguing to see the thematic variation as a Swedish phenomenon. A comparison of 
the present findings and the international literature shows, however, that the identified themes are not 
predominantly national. 

The plurality of ideas can be simultaneously an asset and a problem. The positive views of the 
significance of the context of the ALMs, of technology, outreach, pedagogy, training, the intrinsic value of 
the institutions and their role as a public good provide useful starting points for formulating future 
strategies, but it is apparent that such a diversity can also be a weakness. Even if the contemporary 
management practices tend to favour continuous innovation and learning instead of paradigmatic 
orthodoxies (Gregg, 2011; Sadler, 2003; Wenger, 1999), the plurality of the themes emerging from the data 
analysis and the similar diversity of their theoretical and political underpinnings seem to suggest of a lack 
of a clear focus rather than the presence of a productive melange. In spite of some overlap, the themes 
represent parallel rather than complementary approaches to explain the present and plan for the future. On 
a principal level, the ALM professionals seem to have embraced the criticism of the proponents of 
postmodernism (e.g. Cook, 2001) and critical theory (e.g. Henning, 2006; Leckie et al., 2010) in that the 
ALMs need to become more pluralistic, inclusive and globally representative. At the same time, the diversity 
of the emphases in the present findings shows that there is no apparent master vision of how this adaptation 
to the subtext of the dominating contemporary ideologies, or “modernisation” (Sahlén, 2005), should be 
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operationalised. The externalisation of the challenges of the ALMs is an example of a group-serving (or 
here, perhaps rather institution-serving) bias (Shepperd et al., 2008), the classical human propensity to 
attribute failures to external factors. The conceptualisations of the main drivers of change in terms of 
training (of the ’users’), pedagogy or technology are similar, albeit less direct forms of a similar mindset. 
Emphasising the significance of instrumental, and as such essentially external factors rather than strategic 
priorities can be counter-productive. It is apparent that the results of the present type of a survey study do 
not give direct answers to the question of how the mission of the ALMs should be formulated in the future. 
It is to be determined whether it should be common to all ALMs, include some common elements, or to be 
specific to the particular types of ALM institutions. What is clear, however, is that such a formulation 
would be highly helpful for individual institutions and the ALM sector as a whole to guide and direct their 
strategic planning. 


6 Conclusions 


The findings of the present study show that there are several competing ideas of the future role of the 
institutions and of the strategies for reaching a diversity of explicit and implicit goals that seem to be largely 
unrelated to institutional background of the ALM professionals. A central implication of the apparent lack 
of a consensus is that there is a need to discuss and define the future of the ALMs on a profound level of 
their societal role. The diversity of opinions and such primarily practice-oriented visions of the future as 
the 2.0 phenomenon in the last decade or the comparable movements for promoting participation and 
(unspecific) openness can be helpful in shaping and reshaping the future of the institutions. At the same 
time, it is apparent that they do not have the required theoretical depth to function as a positive ’orthodoxy’ 
(as expressed by Lappin, 2010) for providing a common ground for the discussion. Implicit balancing 
between the maintenance of the role of being a public good with an outward bound mission of educating 
people, endorsing an image of a post-modern space of empowerment and implementing a neo-liberalistic 
agenda of measuring the relevance of the ALMs in terms of monetary benefits is not sustainable. Even if 
these disparate approaches may subscribe to the relevance of certain abstract ideals, they have fundamental 
differences. It would be probably too hasty to deny the possibility of finding synergies between the 
approaches altogether, but in order to succeed, the central tenets of the individual institutions and the 
collective role of the ALMs in the society need to be articulated in explicit enough terms to form a common 
ground for the work in the ALM institutions. ALMs need to choose whether they are institutions of 
enlightenment, postmodern spaces of empowerment, result-orientated financial units, or perhaps, seats of 
something that is yet to be invented. 


7 References 


(2009). Digital Britain: Final report. Tech. rep., Department for Culture, Media and Sport and 
Department for Business, Innovation and Skills, London. URL 
http://www.culture.gov.uk/images/publications/digitalbritain-finalreport-jun09.pdf 

(2009). Einleitung. In K. Ebeling, & S. Giinzel (Eds.) Archivologie: Theorien des Archivs in Philosophie, 
Medien und Kiinsten, (pp. 7-26). Berlin: Kadmos. 

Abram, S. (2007). Web 2.0, Library 2.0, and Librarian 2.0: Preparing for the 2.0 World. In S. Ricketts, C. 
Birdie, & E. Isaksson (Eds.) Library and Information Services in Astronomy V, vol. 377 of 
Astronomical Society of the Pacific Conference Series, (pp. 161-167). 

Alain, A., & Foggett, M. (2007). Towards community contribution: Empowering community voices on- 
line. In J. Trant, & D. Bearman (Eds.) Museums and the Web 2007: Proceedings. Toronto: 
Archives & Museum Informatics. 

Ambrose, T., & Paine, C. (2006). Museum Basics, 2nd Edition. Routledge. 


59 


iConference 2014 Isto Huvila 


Anderson, K. (2007). Global archive and record-keeping research agendas: Encouraging participation and 
getting over the hurdles. Journal of the Society of Archivists, 28(1), 35-46. 

Bailey, E. (2006). Researching museum educators’ perceptions of their roles, identity, and practice. 
Journal of Museum Education, 31(3), 175-198. 

Baker, D. (2007). Combining the best of both worlds: the hybrid library. In Digital Convergence: 
Libraries of the Future, (pp. 95-105). Springer London. 

Barry, R. (2010). Opinion piece - electronic records: now and then. Records Management Journal, 20(1), 
157-171. 

Bates, M. J. (2006). Fundamental forms of information. Journal of the American Society for Information 
Science and Technology, 57(8), 1033-1045. 

Bowitz, E., & Ibenholt, K. (2009). Economic impacts of cultural heritage - research and perspectives. 
Journal of Cultural Heritage, 10(1), 1 — 8. 

Brown, R. H., & Davis-Brown, B. (1998). The making of memory: the politics of archives, libraries and 
museums in the construction of national consciousness. History of the Human Sciences, 11(4), 17- 
32. 

Buckland, M. (1991). Information as thing. JASIS, 42(5), 351-360. 

Buckland, M. (1997). What is a document? Journal of the American Society for Information Science and 
Technology, 48(9), 804-809. 

Carrozzino, M., & Bergamasco, M. (2010). Beyond virtual museums: Experiencing immersive virtual 
reality in real museums. Journal of Cultural Heritage, 11(4), 452 — 458. 

Casey, M., & Savastinuk, L. (2007). Library 2.0: A Guide to Participatory Library Service. Medford, NJ: 
Information Today. 

Cerquetti, M. (2010). Dall’economia della cultura al management per il patrimonio culturale: presupposti 
di lavoro e ricerca. Il capitale culturale. Studies on the Value of Cultural Heritage, 1(1). 

Cook, T. (1997). What is past is prologue: A history of archival ideas since 1898, and the future paradigm 
shift. Archivaria, 43(1). 

Cook, T. (2001). Archival science and postmodernism: new formulations for old concepts. Archival 
Science, 1(1), 3-24. 

Costantino, T. (2012). How does your public library support democracy? In Proceedings of the 2012 
iConference, iConference ’12, (pp. 468-470). New York, NY, USA: ACM. 

Dempsey, L. (2000). Scientific, industrial, and cultural heritage: a shared approach: A research framework 
for digital libraries, museums and archives. Ariadne, (22). 

Donaldson, T., & Preston, L. E. (1995). The stakeholder theory of the corporation: Concepts, evidence, 
and implications. The Academy of Management Review, 20(1), 65-91. 

Downey, A., Kirby, P., & Sherlock, N. (2010). Payment for success — how to shift power from Whitehall 
to public service customers. Tech. rep., KPMG. 

Duncker, E. (2002). Cross-cultural usability of the library metaphor. In Proceedings of the 2nd 
ACM/TEEE-CS joint conference on Digital libraries, JCDL 02, (pp. 223-230). New York, NY: 
ACM. 

Duranti, L. (1999). Concepts and principles for the management of electronic records, or records 
management theory is archival diplomatics. Records Management Journal, 9(3), 149-171. 

Duranti, L. (2010). Concepts and principles for the management of electronic records, or records 
management theory is archival diplomatics. Records Management Journal, 20(1), 78 — 95. 

Ebeling, K., & Giinzel, S. (Eds.) (2009). Archivologie: Theorien des Archivs in Philosophie, Medien und 
Künsten. Berlin: Kadmos. 

Eryaman, M. Y. (2010). The public library as a space for democratic empowerment: Henry giroux, radical 
democracy, and border pedagogy. In G. J. Leckie, L. M. Given, & J. Buschman (Eds.) Critical 


60 


iConference 2014 Isto Huvila 


theory for library and information science exploring the social from across the disciplines, (pp. 
131-141). Santa Barbara, CA: Libraries Unlimited. 

European Commission (2006). Commission recommendation of 24 august 2006 on the digitisation and 
online accessibility of cultural material and digital preservation (2006/585/ec). Official Journal of 
the European Union, (236), 28-30. 

Evjen, S., & Audunson, R. (2009). The complex library: Do the public’s attitudes represent a barrier to 
institutional change in public libraries? New Library World, 110(3), 161-174. 

Featherstone, M. (2006). Archive. Theory, Culture & Society, 23(2-3), 591-596. 

Flinn, A. (2010). “an attack on professionalism and scholarship? *: Democratising archives and the 
production of knowledge. Ariadne, (62). 

Foucault, M. (2002). The Archeology of Knowledge. London: Routledge. L’ Archeologie du savoir first 
published 1969 by Editions Gallimard. 

Fuller, T., & Loogma, K. (2009). Constructing futures: A social constructionist perspective on foresight 
methodology. Futures, 41(2), 71-79. 

Galani, A., & Chalmers, M. (2010). Empowering the remote visitor: supporting social museum experiences 
among local and remote visitors. In R. Parry (Ed.) Museums in the digital age, (pp. 159-169). 
London: Routledge. 

Genoways, H. H. (Ed.) (2006a). Museum Philosophy for the Twenty-first Century. Lanham: Altamira 
Press. 

Genoways, H. H. (2006b). To the members of the museum profession. In H. H. Genoways (Ed.) Museum 
Philosophy for the Twenty-first Century, (pp. 221-234). Lanham: Altamira Press. 

Gilliland, A., & Mckemmish, S. (2004). Building an infrastructure for archival research. Archival Science, 
4(3), 149-197. 

Gilliland-Swetland, A. J. (2000). Enduring paradigm, new opportunities: The value of the archival 
perspective in the digital environment. Tech. Rep. 89, CLIR, Washington, DC. 

Granstrém, C. (2002). Law and information technology: Swedish views: An anthology produced by the IT 
Law Observatory of the Swedish ICT Commission, vol. SOU 2002:112, chap. Archives of the 
Future, (pp. 99-106). Stockholm: Swedish Government Official Reports. 

Gregg, M. (2011). Work’s Intimacy. Cambridge: Polity. 

Griffin, D. (2008). Advancing museums. Museum Management and Curatorship, 23(1), 43-61. 

Harris, R. (2009). “Their little bit of ground slowly squashed into nothing”: Technology, gender, and the 
vanishing librarians. In G. J. Leckie, & J. Buschman (Eds.) Information technology in 
librarianship: new critical approaches. Westport, CN: Libraries Unlimited. 

Henning, M. (2006). Museums, media and cultural theory. Maidenhead: Open University Press. 

Hickerson, H. (2001). Ten challenges for the archival profession. American Archivist, 64(1), 6-16. 

Hill, M. W. (1999). The Impact of Information on Society. London: Bowker-Saur. 

H6ij, P. (2005). Information, förvaltning och arkiv - en antologi, vol. 20 of Arkiv i Norrland, chap. Bland 
reproduktion och silverfiskar: Ett försök till en definition av arkivpedagogik, (pp. 158-185). 
Harnosand: Landsarkivet i Harnosand. 

Holmberg, K., Huvila, I., Kronqvist-Berg, M., & Widén-Wulff, G. (2009). What is library 2.0? Journal of 
Documentation, 65(4), 668-681. 

Hooper-Greenhill, E. (2007). Education, postmodernity and the museum. In Museum revolutions: How 
museums change and are changed, (pp. 367-377). London: Routledge. 

Huvila, I. (2008). Participatory archive: towards decentralised curation, radical user orientation and 
broader contextualisation of records management. Archival Science, 8(1), 15-36. 

Jimerson, R. C. (2004). The future of archives and manuscripts. OCLC Systems & Services, 20(1065- 
075X), 11-14. 


61 


iConference 2014 Isto Huvila 


Julien, H., & Genuis, S. K. (2011). Librarians’ experiences of the teaching role: A national survey of 
librarians. Library & Information Science Research, 33(2), 103 — 111. 

Kearns, J., & Rinehart, R. (2011). Personal ontological information responsibility. Library Review, 60(3), 
230-245. 

Kelly, B., Bevan, P., Akerman, R., Alcock, J., & Fraser, J. (2009). Library 2.0: balancing the risks and 
benefits to maximise the dividends. Program: electronic library and information systems, 43(3), 
311 — 327. 

Kungliga biblioteket (2011). Bibliotek 2010. Tech. rep., Stockholm. 

Lankes, R. D., Silverstein, J. L., Nicholson, S., & Marshall, T. (2007). Participatory networks: The library 
as conversation. Information Research, 12(4). URL http://InformationR. net /ir/12-4/colis05.html 

Lappin, J. (2010). What will be the next records management orthodoxy? Records Management Journal, 
20, 252-264. 

Latham, K. F. (2012). Museum object as document: Using buckland’s information concepts to understand 
museum experiences. Journal of Documentation, 68(1), 45-71. 

Leckie, G. J., Given, L. M., & Buschman, J. (2010). Critical theory for library and information science 
exploring the social from across the disciplines. Santa Barbara, CA: Libraries Unlimited. 

Lim, S., & Liew, C. L. (2011). Metadata quality and interoperability of glam digital images. Aslib 
Proceedings, 63(5), 484-498. 

Lund, N. W., & Buckland, M. (2008). Document, documentation, and the document academy: 
introduction. Archival Science, 8(3), 161-164. 

Maceviciute, E., & Wilson, T. (2009). A delphi investigation into the research needs in swedish 

librarianship. Information Research, 14(4). URL http://informationr.net/ir/14-4/paper419.html 

Manzuch, Z. (2009). Archives, libraries and museums as communicators of memory in the european union 

projects. Information research, 14(2). URL http://informationr.net /ir/14-2/paper400.html 

Marstine, J. (2006). New museum theory and practice: an introduction. Malden, MA: Blackwell. 

Martinon, J.-P. (2006). Museums and restlessness. In H. H. Genoways (Ed.) Museum Philosophy for the 

Twenty-first Century, (pp. 59-68). Lanham: Altamira Press. 

McKemmish, S., Gilliland-Swetland, A., & Ketelaar, E. (2005). “Communities of memory”: Pluralising 

archival research and education agendas. Archives and Manuscripts, (33), 146-174. 

McLeod, J. (2008). Records management research: Perspectives and directions. Journal of the Society of 

Archivists, 29(1), 29-40. 

Merritt, E. (2008). Museums & society 2034: Trends and potential futures. Tech. rep., Center for the 

Future of Museums, American Association of Museums, Washington DC. 

Mi, J., & Nesta, F. (2006). Marketing library services to the Net Generation. Library Management, 27(6), 

411-422. 

Nesta, F., & Mi, J. (2011). Library 2.0 or library iii: returning to leadership. Library Management, 

32(1/2), 85-97. 

Norberg, A., Dahlin, M., & Hjorth, B. (2009). Gar det att vara så kategorisk? — debatten om 
arkivarierollen fortsätter. DIK Debatt och opinion. 


O'Connor, L. (2009). Information literacy as professional legitimation: the quest for a new jurisdiction. 
Library Review, 58(7), 493 — 508. 

Pastore, E. (2009). The future of museums and libraries: A discussion guide. Tech. Rep. IMLS-2009-RES- 
02, Institute of Museum and Library Services, Washington, D.C. 

Rayward, W. B., & Jenkins, C. (2007). Libraries in times of war, revolution, and social change. Library 
Trends, 55(3), 361-369. 

Rennemark, A.-K. (2011). Bilder av biblioteket: en ideologianalys av debatten omkring privatisering av 
folkbibliotek. Master’s thesis, Lund University, Lund. 


62 


iConference 2014 Isto Huvila 


Ridolfo, J., Hart-Davidson, W., & McLeod, M. (2010). Balancing stakeholder needs: Archive 2.0 as 
community-centred design. Ariadne, (63). 

Rosa, C. D., Cantrell, J., Carlson, M., Gallagher, P., Hawk, J., Sturtz, C., Cellentani, D., Dalrymple, T., 
Olszewski, L., & Gauder, B. (2011). Perceptions of libraries, 2010: Context and community. Tech. 
rep., OCLC, Dublin, OH. 

Sadler, P. (2003). Strategic Management. London: Kogan Page, 2 ed. 

Sahlén, T. (2005). ABM - utveckling. In M. Molin, & B. Wittgren (Eds.) Om ABM: En antologi om 
samverkan mellan arkiv, bibliotek och museer, (pp. 50-58). Harnosand: ABM Resurs: Länsmuseet 
Västernorrland. 

Shepperd, J., Malone, W., & Sweeny, K. (2008). Exploring causes of the self-serving bias. Social and 
Personality Psychology Compass, 2(2), 895-908. 

Shilton, K., & Srinivasan, R. (2008). Participatory appraisal and arrangement for multicultural archival 
collections. Archivaria, (63), 87-101. 

Singh, R. (2009). Does your library have a marketing culture? implications for service providers. Library 
Management, 30(3), 117-137. 

Sinikara, K. (2007). Ammatti, ihminen ja maailmankuva murroksessa: Tutkimus yliopistokirjastoista ja 
kirjastonhoitajista tietoyhteiskuntakaudella 1970-2005. Ph.D. thesis, University of Helsinki, 
Faculty of Theology, Department of Comparative Religion, Helsinki. 

Smith, E. H. (2003). Customer Focus and Marketing in Archive Service Delivery: theory and practice 1. 
Journal of the Society of Archivists, 24(1), 35-53. 

Spock, D. (2006). The puzzle of museum educational practice: A comment on rounds and falk. Curator: 
The Museum Journal, 49(2), 167-180. 

Srinivasan, R., Boast, R., Furner, J., & Becvar, K. M. (2009). Digital museums and diverse cultural 
knowledges: Moving past the traditional catalog. The Information Society: An International 
Journal, 25(4), 265-278. 

Statens kulturråd (2010a). Bibliotek 2009. Tech. rep., Stockholm. 

Statens kulturråd (2010b). Museer 2009. Tech. rep., Stockholm. 

Still, J. (1998). The role and image of library and librarians in discipline-specific pedagogical journals. 
Journal of Academic Librarianship, 24(3), 225-231. 

Stover, M. (2004). The reference librarian as non-expert: A postmodern approach to expertise. The 
Reference Librarian, 42(87), 273-300. 

Sundqvist, A. (2007). The use of records - a literature review. Archives & Social Studies, 1(1), 623-653. 

Torstensson, M. (2002). Libraries and society — the macrostructural aspect of library and information 
studies. Library Review, 51(3), 211-220. 

Trant, J. (2009). Emerging convergence? thoughts on museums, archives, libraries, and professional 
training. Museum Management and Curatorship, 24(4), 369-387. 

Usherwood, B., Wilson, K., & Bryson, J. (2005a). Relevant repositories of public knowledge?: Libraries, 
museums and archives in’the information age’. Journal of Librarianship and Information Science, 
37(2), 89-98. 

Usherwood, B., Wilson, K., & Bryson, J. (2005b). Relevant repositories of public knowledge? perceptions 
of archives libraries and museums in modern britain. Tech. rep., The Centre for the Public 
Library and Information in Society, Department of Information Studies, University of Sheffield, 
Sheffield. 

VanderBerg, R. (2012). Converging libraries, archives and museums: overcoming distinctions, but for 
what gain? Archives and Manuscripts, 40(3), 136-146. 

Wagman, A. K. (2011). Olika syn på saken: Folkbiblioteket bland användare, icke-anvandare och 
personal. Tech. rep., Svensk biblioteksf6rening, Stockholm. 


63 


iConference 2014 Isto Huvila 


Wavell, C., Baxter, G., Johnson, I., & Williams, D. (2002). Impact evaluation of museums, archives and 
libraries: available evidence project. Aberdeen Business School, The Robert Gordon University, 
Aberdeen. 

Wenger, E. (1999). Communities of Practice: Learning, Meaning, and Identity. Cambridge: Cambridge 
University Press. 

Woolley, J. (2011). Community managed libraries. Tech. rep., Museums Libraries and Archives Council 
Grosvenor House, Birmingham. 


8 Table of Tables 


Table 1: Descriptive statistics of the statements on a 10-point Likert scale.......... cece eeeeeeeeeeeeeeeeeeenes 53 


Table 2: Summary of the correlation analysis (rcorr and spearman.test in pspearman) of the statements.55 


64 


Digital Inclusion for Migrant Millennials: Improving the ICT Landscape of Yakima 
Valley Schools 


Bryan Dosono! 
1 Syracuse University 


Abstract 

Digital inclusion seeks to bring the benefits of information and communication technologies (ICT) to 
vulnerable populations such as low-income families, residents of rural communities, seniors, disabled 
citizens, at-risk youth, immigrants, refugees and people of color. Despite its thriving agricultural industry, 
the Yakima Valley in Washington State is designated as an economically distressed area with low wages, 
significant unemployment and high poverty levels. The area's agricultural emphasis attracts a large 
population of migrant workers who are generally perceived to be information poor, meaning they face 
major challenges with finding and using greatly needed everyday information. Little research in ICT 
access for migrant populations exists because differences in language, culture and other factors make 
migrant workers and their youth a particularly difficult population to study. Using the Yakima Valley 
as a research site, this work examines current digital inclusion efforts towards migrant youth and how 
rising workers of the millennial generation can better participate in today’s digital economy. This research 
involves reviewing literature on the information ecosystem of the Yakima Valley, interviewing school 
district administrators for their insight into the current ICT landscape of their facilities and evaluating 
current educational technology access strategies within the region. The work provides recommendations 
aimed at influencing policy and awareness for digital inclusion within the school system. 
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1 Introduction 


The emergence of information and communication technologies (ICT) has transformed today’s economy 
through the diffusion of new tools. Personal computers and the internet improve how modern society 
interacts, learns and earns a living. However, there are multiple disparities associated with ICT deployment 
and access. According to Chakraborty & Bosman (2005), factors such as race, education and income 
contribute as leading causes of those disparities. Groups with higher incomes and better education, 
particularly Caucasians and Asian Americans, are adopting newer ICTs faster and are connecting more 
often to the information economy. Fairlee (2004) reports that barriers such as language have been found to 
explain low rates of computer and internet access among Americans of Latin or Hispanic descent. 
Community programs that seek to overcome the digital divide promote digital inclusion and foster the 
ability of individuals and groups to access and use ICTs. Digital inclusion policies aim to level the playing 
field of technological opportunity for underserved populations (FCC, 2012). Marginalized groups that 
experience barriers to ICT include low-income families, residents of rural communities, seniors, disabled 
citizens, at-risk youth, immigrants, refugees and people of color. 
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2 Literature Review 


2.1 Socioeconomic history of the Yakima Valley 


The Yakima Valley was created by the flow of a river through the million acres of land in what is now 
Central Washington State. By the closing of the nineteenth century, large-scale irrigation projects helped 
local farmers realize the land’s potential. The Yakima Valley became the fruit bowl of Washington, 
producing apples, peaches, pears, cherries and grapes, as well as hops, sugar beets and asparagus (Gamboa, 
1981). Despite its thriving agricultural industry, the Yakima Valley is designated as an economically 
distressed area with low wages, significant unemployment high poverty levels. The area's agricultural 
emphasis attracts a large population of immigrant Hispanic farm workers. In the Census 2010 questionnaire, 
46% of survey respondents in Yakima County of Washington State reported themselves as persons of 
Hispanic or Latino origin (US Bureau of the Census, 2012). This number is anticipated to be higher if 
undocumented workers are taken into account. 

Unlike areas of the southwestern United States, Mexican settlement in the Yakima Valley is a 
recent phenomenon. Per Coronado (2005), large-scale Mexican immigration to the valley began during 
World War II when high demand for agricultural labor led to the enactment of the Bracero Program from 
1942 to 1964, bringing more than 35,000 Mexican laborers to Washington. Gamboa (2000) reports that 
since the Bracero Program ended in 1964, Mexican immigrants continue to find their way to the Yakima 
Valley to find employment or to unite with family members and friends who settle in the area. 


2.2 Migrant youth as infomediaries 


Fisher’s (2004) work in studying the information behavior of migrant Hispanic farm workers and their 
families in the Pacific Northwest classify migrant workers as information poor, meaning they face major 
challenges with finding and using greatly needed everyday information. Migrant workers are defined by the 
United Nations (1990) as people who work outside of their home country. Face-to-face interaction within 
personal networks is a highly favored information gathering activity for migrant workers. Secondary sources 
include radio and television print media, although many factors determine the use of ICT among immigrant 
populations. Migrant workers are more likely to trust in-person communication from their social networks 
in environments like church, school and the workplace. However, the undocumented status of some migrant 
families adds a complex layer of difficulty for deciding with whom and how they share information. 

Chu (1999) examines how children of these migrant workers play a significant role in facilitating 
the literacy interactions of their immigrant parents, relatives and friends. These children serve as 
infomediaries, or information intermediaries, and are called upon to perform adult responsibilities for their 
parents ranging from conversational interpretation, filling out legal forms and translating information. 
Children who serve as infomediaries are an asset to their families, but these expectations are imposed on 
them because of their need to balance two different cultural environments: that of their ethnic community 
and that of American society. Dependence on English speaking children eternalizes a damaging pattern 
among immigrant families as the cycle of children translators perpetuates the isolated life of immigrants. 
Such differences in language, culture and other factors make migrant youth of the millennial generation — 
those born in or after 1980 (Ng, Schweitzer and Lyons, 2010)—a challenging population to study. 


2.3 People, places and partnerships of ESD 105 


Administrators of Educational School District (ESD) 105 in the Yakima Valley developed a migrant 
education program to assist local school districts that work with a large number of migrant youth (Harding 
& Sykes, 1994). The ESD 105 Migrant Education Regional Office (MERO) serves approximately 16,400 
migrant students who attend the 25 public school districts and 23 state-approved private and tribal schools 
in Washington State. MERO works with school districts and teachers to implement federal and state 
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programs that support migrant student learning and parent involvement. With support from federal Title 
1 funds, MERO serves in liaison activities between the U.S. Department of Education’s Migrant Education 
Program, the Office of State Superintendent of Public Instruction, Migrant and Bilingual Education, and 
the schools, communities and parents that work directly with students. 

Wapato School District (WSD) is a public school district located in Wapato, Washington. In the 
2011-2012 school year, the district enrolled approximately 3,400 students. The student body is culturally 
diverse: 72% of the students are Hispanic and 19.4% are American Indian. Nearly one in five students are 
from the Yakama Nation. Roughly 29% of the students at WSD are reportedly identified as migrant and 
100% of the students qualified for free or reduced meals (Figure 1). WSD is one of 25 other school districts 
that fall within the jurisdiction of ESD 105 and partners with MERO to share migrant-specific resources 
for its students. 


e How does ESD 105 see technology fit in its mission/vision statement? Is ESD 105 engaged in any 
digital inclusion programs or initiatives? 

e How does ESD 105 leverage existing investments in its IT infrastructure? What is needed in 
technology capacity /skill building to keep moving diverse participation forward that supports ESD 
105’s work? 

e Where are the strengths, weaknesses, opportunities and threats in your information technology 
management department? Are there any gaps? Who is left out and at risk of being left behind in 
accessing information resources? 

e What is the source of funding for your technology services? How are libraries and computer labs 
furnished/maintained? Do you feel that those facilities are over/under utilized? What digital 
technologies are currently available for students, faculty and staff? 

e What are ESD 105’s measures of success (benchmarks) for digital inclusion? Who oversees all IT 
operations? How are technology strategies evaluated? 

e Do you have any experience working in school districts that are predominantly white? How does 
that compare with your experiences interacting with the diverse composition of students, staff and 
faculty at ESD 105? 

e What is your experience working in a school district that serves a significant migrant population? 


Do any interesting stories come to mind from your involvement with this community? 


Figure 1: Interview questions 


2010 Enrollment Statistics 


Enrollment Total Percentage 
October 2010 Student Count 3,376 

May 2011 Student Count 3,291 

Gender (October 2010) 

Male 1,709 50.6% 
Female 1,667 49.4% 
Race/Ethnicity (October 2010) 

American Indian/Alaskan Native 665 19.7% 
Asian 52 1.5% 
Asian/Pacific Islander 52 1.5% 
Black 4 0.1% 
Hispanic 2,428 71.9% 
White 143 4.2% 
Two or More Races 83 2.5% 
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Special Programs 


Free or Reduced-Price Meals (May 2011) 3,291 100.0% 
Special Education (May 2011) 417 12.7% 
Transitional Bilingual (May 2011) 824 25.0% 
Migrant (May 2011) 954 29.0% 
Section 504 (May 2011) 13 0.4% 
Foster Care (May 2011) 0 0.0% 


Table 1: Student demographics of Wapato School District 


Source: Office of Superintendent of Public Instruction Washington State Report Card 


3 Methodology 


This research involved scanning literature on information behavior for migrant youth in the Yakima Valley, 
interviewing relevant school district administrators of ESD 105, assessing metrics of success and evaluating 
current technology access strategies within the region. The literature review suggested that the information 
behavior of various groups of immigrant youth varied depending on their enrollment status. Specifically, 
this research focuses on migrant youth currently enrolled in ESD 105 schools. 

Five administrators within ESD 105 are interviewed for their insight into the high level analysis of 
decision making in their respective IT departments. Respondents included the Director of the Migrant 
Education Regional Office, the Director of the Educational Technology Support Center and the Director of 
the Yakima Valley Library. The Director of Technology and Assessment from Wapato School District was 
also interviewed to gain context into a particular school district of ESD 105. Although K-12 youth were the 
targeted end users of the technology provided by the school district, school administrators could provide 
context to budgetary policies and technical resource constraints where students cannot. 

Semi-structured interviews allow stakeholders the freedom to express their views in their own terms 
(Courage and Baxter, 2004). Each respondent was asked a set of predetermined questions relating to 
strategies that examine what technologies are available for migrant youth, how often key performance 
indicators of success are evaluated and how current technology access strategies are assessed. All responses 
were recorded, transcribed and coded to transform raw data into a standardized format for interpreting 
recurrent concepts and identifying emergent themes (Pickard, 2007). Additional data guiding the analysis 
of the research draw from the Washington Migrant Education Program Service Delivery Plan, which 
documents the substantiated needs of migrant students in the state, sets performance targets for meeting 
their needs and provides the general strategy for local response to their needs. 


4 Findings and Discussion 


4.1 Technology usage and infrastructure 


The Federal Communication Commission’s (FCC) E-Rate Program helps provide affordable access to 
telecommunications services for eligible schools and libraries, particularly those in rural and economically 
disadvantaged areas. E-Rate provides discounts on four key areas of service: telecommunications services, 
internet access, internal connections and basic maintenance of internal connections. Wapato School District 
(WSD) receives approximately $3 million dollars of E-Rate money that fund approximately 90% of its IT 
services. One of the limitations of the E-Rate Program is that internet access cannot be accessed outside of 
the school district. To ensure compliance with E-Rate requirements, WSD administration completes 
monthly technology audits to assess the usage of its end-user equipment like computers and telephones. 
The district utilizes its collected bandwidth capacities to shape network usage and provides access to 
network applications that integrate technology into the classroom. Additionally, WSD’s administrators 
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complete over 100 walks a month in classrooms and computer labs to ensure high quality of instructional 
technology in each campus facility. A high quality Wi-Fi network is present within the district; many 
instructors allow students to bring their own devices (i.e. laptops, iPads) and use them during class. 

ESD 105 is affiliated with several educational initiatives that offer high-quality internet resources 
across many disciplines. Thinkfinity is the Verizon Foundation’s online professional learning community 
that provides free access to digital resources for over 60,000 educators in curriculum enhancement. In this 
service, educators connect and collaborate through themed groups, blogs and discussions to share resources 
and best practices that support 21st century teaching and learning. Thinkfinity and the Educational 
Technology Support Center at ESD 105 have agreed to work together to deliver Thinkfinity training to 
educators in Washington. 

The Washington Learning Source (WLS) is another statewide educational technology resource 
developed by ESDs of Washington State. Its mission is to provide a place for districts to choose products 
and services that meet their needs and create economic efficiencies through ESD collaboration and a 
regionally supported delivery model. For over 30 years, Washington ESDs provides a vast array of services 
to school districts for the purpose of assuring equal educational opportunities for quality education and 
lifelong learning. WLS is an expansion of the ETSC purchasing program and serves as a primary source to 
access a variety of resources available in enhancing education and teaching in the state of Washington. 

All educational technology plans within the Yakima Valley center around developing the necessary 
skills of its students to be productive members of society. Research continues to affirm the positive impact 
that effective instruction coupled with a technology-rich learning environment can have on student 
performance (West, 2011). Computers help students improve their performance on basic skills tests and are 
powerful tools for problem solving, conceptual development and critical thinking. Technology integration 
has demonstrated that students learn quickly and with greater retention when learning with the aid of 
computers. Adequate teacher training is an integral element of successful learning programs based or assisted 
by technology. 


4.2 Technology training from committed staff 


The Educational Technology Support Center (ETSC) of ESD 105 works with the region’s schools to provide 
leadership and assistance in using technology to boost student success. ETSC helps teachers, administrators 
and office staff use technology for program planning, classroom integration, program assessment, research 
and grant writing. In addition to offering consultation on internet safety, legal and ethical technology use 
in K-12 education, administrators meet with ETSC to develop technology plans for district initiatives. 

Providing professional development workshops for instructors is a priority for ETSC. At no cost, 
any teacher or staff member within ESD 105 can engage in 10 days of training in technology integration 
and lesson design. Trainings from ETSC range from various domains of technical skills, including learning 
how to design graphics with Adobe Creative Suite programs, conducting a videoconference call and 
developing a website portal for classroom resources. All instructors are required to attend professional 
development seminars on a yearly basis to maintain their proficiency with current and upcoming technology 
tools. 

Various technology-focused programs allow for the regular training of teachers and staff. For 
instance, the Peer Coaching Program aims to increase the capacity within teachers of Central Washington 
to use educational technology in the classroom effectively. The ETSC at ESD 105 has been training and 
supporting peer coaches since 2003. Already 244 instructors have trained through this program. Peer 
coaching allows two or more teachers to work together, one coaching the other, to improve individual 
instructional practice. As colleagues in the same school, they share instructional experiences through 
observing and teaching in each other’s classrooms. Peer coaching works best between beginning educators 
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who have at least some experience with technology integration and experienced educators who are ready to 
coach a colleague with 21st century classroom technology. 

Extra resource development such as the Prepare to Integrate Learning Opportunities with 
Technology (PILOT) tool is also supported by the ETSC. This online staff self-assessment tool determines 
an educator’s level of technology proficiency and classroom application. Based upon the results of the 
assessment, PILOT software allows educators to view and select learning opportunities throughout the state 
to advance their proficiency level. PILOT also develops a learning community for educators to meet and 
participate in statewide projects and can function a tool for districts to use within their staff to organize 
professional development efforts. 

ESD 105 evaluates how educators integrate technology in their everyday teaching through a tiered 
model of technology use in classrooms. At the first tier, the teacher uses technology to accomplish a specific 
task. Specifically, the instructor supports the learning experience by finding instructional resources on the 
internet, communicating quickly with email, posting grades and supplying other relevant information online 
for parents. At the second tier, the teacher involves facilitating a large group of learning activities and 
encourages student use of technology. In this way, the instructor enhances the learning experience by 
delivering visual presentations, conducting one-computer classroom lessons and collecting student 
assignments online. At the third tier, the teacher involves student use of technology in individualized 
learning activities. In doing so, the instructor personalizes the learning experience by authoring work online, 
managing online discussions and inventing products through programming. This model is developed by 
ETSC Directors in Washington State, approved by the Office of Superintendent of Public Instruction and 
included in ESD 105’s Technology Planning support documents (Fisher, 2013). 


4.3 Student technology usage outside of schools 


Other than school district computer labs, students in the Yakima Valley also have access to various branches 
of the Yakima Valley Libraries (YVL). Supported through local property taxes, the rural county library 
district comprises of a central library and 16 community libraries located throughout Yakima County. All 
towns or cities are either annexed to or contracted with Yakima Valley Libraries for library services. 
Annexed cities include Harrah, Moxee, Selah, Yakima (Yakima Central, Southeast and West Valley), 
Sunnyside, Toppenish, Wapato and Zillah. Contracting cities include Mabton, Granger, Tieton and Naches. 
Rural county locations include Buena, Terrace Heights and White Swan. YVL currently serves over 240,000 
people in Yakima County and aims to support lifelong learning by providing free, open and full access to 
information. Migrant youth who visit YVL commonly seek the following information services: education 
and career preparation, health and wellness, English as a second language and legal consultation. The 
central branch of YVL is equipped with a lab of over 150 computers and grants wireless internet access to 
all patrons. With the majority of its staff bilingually proficient, YVL librarians hardly experience difficulty 
communicating with migrant youth who just moved to the Yakima Valley with little or no English speaking 
skills. 

One of the most noticeable technology gaps for migrant students in the Yakima Valley is the lack 
of internet access outside of school facilities. Although branches of YVL throughout the region offer access 
to ICTs, various branches are located far from school premises. Students who reside in low-income public 
housing lack transportation to these libraries, which have been a commonly expressed concern among 
parents and administrators. Additionally, although the central branch of YVL is equipped with over 150 
computers, other sites like the branch in Wapato are limited to 5-10 computers at any given time for general 
public use. The scarcity of public computers and limited connectivity outside school zones negatively 
impacts how these students connect to online resources to complete class assignments. Youth from low- 
income families are especially at risk because they are less likely able to afford personal computers, essential 
software or even an internet service subscription. High school students who are seeking jobs or applying for 
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college are also restricted to completing these online applications within school hours if they do not have 
the necessary technology available in their homes. For migrant students, accessing extra computer time for 
educational use before or after school can be challenging if their schedules need to accommodate for 
employment or family caregiving responsibilities. 

The lack of concrete data on what is available in student homes is an underlying problem in realizing 
digital inclusion for ESD 105 administrators. This is especially the case for Wapato School District. 
Information about in-home technology usage is not surveyed from students. Migrant educators and 
administrators are not fully aware of the computer fluency levels of their students. Without collecting self- 
reported data on personal technology inventories, a holistic picture of the information ecosystem within K- 
12 education cannot be painted for the Yakima Valley. If this unknown data was made available, teachers 
can better tailor their curriculum to compliment student strengths or adjust lessons accordingly to various 
technology competency levels within the classroom. 


4.4 Addressing the achievement gap 


Migrant youth face numerous challenges that prevent them from succeeding in education (Figure 3). Most 
common among these challenges is the susceptibility of migrant youth missing school when their families 
move from one work site to another. Furthermore, millennial workers are more pressured to work longer 
hours on the job instead of studying for school. Frequent moves and recurrent absences mean that migrant 
students often fall behind academically. However, technology is a valuable tool that can enhance learning 
and enrich the education opportunities for migrant students. Distance learning programs that move with 
the migrant students and allow them the capacity to access their coursework from anywhere they live could 
provide the greatest potential for academic achievement among migrant populations. 

The Migrant Education Regional Office provides technical assistance to federal project directors 
and school administrators for the implementation of their Migrant Service Delivery Plan. In Washington 
State, migrant students are held to the same challenging academic performance targets and indicators that 
all students are expected to meet under the federal Adequate Yearly Progress (AYP) guidelines. State-level 
performance data for migrant students is used for policy development and targets program interventions 
that assure satisfactory academic performance. MERO administrators believe that engaging parents in the 
education of their children is key to achieving student academic success. MERO’s parent services assist 
school communities in the implementation of programs designed to actively involve parents as partners in 
the education of their children in the home, school and community. MERO also provides community 
trainings for migrant parents who want to get more involved with their children’s education. These trainings 
clarify misperceptions that teachers and migrant parents have of one another, discuss barriers and solutions 
to parent involvement, and walk through the process of establishing parent advisory committees within 
school districts. Overall, MERO’s proactive efforts in engaging and empowering migrant parents enhance 
the learning success of their students. 

Aside from MERO, various nonprofit organizations in the area contribute to transforming the lives 
of young people from rural school districts in education. For instance, the Northwest Learning and 
Achievement Group (NLA) located in Wapato has after school programs that boost academic achievement 
and increase the numbers of students in these low-income areas who are applying for postsecondary 
education. With the help of NLA, thousands of Hispanic and Native American learners throughout the 
region are challenging conventional wisdom by demonstrating that they can exceed expectations and 
overcome the traditional barriers to success. NLA receives funding from the Federal Department of 
Education through the Department of Community Technology Opportunities Program, the Washington 
State Arts Commission and the Gates Foundation. Furnished with a computer lab and staffed with 
educational advisors, the NLA provides educational resources to schools, colleges and community 
organizations so that students from low-income families can reap the benefits of higher education. 
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5 


Recommendations 


The analysis of data collected from interviews with school district administrators illustrate how migrant 


students are a significantly large population group that is at risk of being left behind in accessing ICT 


resources. Proposed below are several strategies that can be investigated and evaluated further to improve 


the educational opportunities of migrant students. 


5.1 


Recommendation 1: Increase access to distance learning alternatives. 


Problem: Migrant youth miss school when their families move from one work site to another, causing 
them to fall behind in the achievement gap compared to their non-migrant peers. (Refer to Figure 
2 for a topical list of seven common concerns for migrant students.) 

Strategy: Increase student access to online programs to mitigate educational disruptions that impact 
grade promotion. Offer flexible courses of study that help migrant students accelerate course 
completion or finish incomplete courses. Refer migrant students to online courses and other distance 
learning opportunities for credit accrual. Provide additional instructional time in the summer or 
evening. 

Assessment: Measure the number of participating students and courses completed. Evaluate if 
supplemental technology instruction is cost effective. Utilize technology and other tools to promote 
skill building in the target content areas. 


Figure 2: Seven areas of concern for the migrant student 


Source: Educational School District 105 


5.2 


Recommendation 2: Prioritize funding for ancillary migrant resources. 


Problem: Non-academic services are the lowest of the four listed priorities in the Washington State 
Migrant Education Program (Figure 3), which include advocacy and outreach activities, professional 
development, family literacy, the integration of information technology into educational and related 
programs, and the transition of secondary school students to postsecondary education or 
employment. 

Strategy: Allocate more funding from various grant sources (School Improvement Grant, Title 
Funds, E-Rate, etc.) to non-academic services. Lobby or sponsor legislation that advocates non- 
academic services as a higher priority item in “Title I, Part C, Migrant Education Statutory 
Requirements: Section 1306(a) of the Elementary and Secondary Education Act.” 

Assessment: Measure the change (increase or decrease) of graduation and retention rates of migrant 
students. 
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*Close the achievement gap in reading, math, writing, and 
science. 


e School readiness, increase graduation rate, and decrease 
drop-out rate. 


e Coordination of services with State Transitional Bilingual 
Instructional Program and Title IIl English Language 
Acquisition Program 


e Advocacy and outreach to migrant children and families; 
professional development for program; family literacy 
programs; integration of information technology into 
educational and related programs; and programs to facilitate 
the transition of secondary school students to post- 
secondary education or employment. 


Figure 3: Washington State Migrant Education Program priorities 
Source: Washington State Migrant Education Program 


5.3. Recommendation 3: Run a pilot technology indicators survey. 


e Problem: School district administrators and faculty do not have any concrete data on the IT 
resources available in student homes. 

e Strategy: Conduct an annual technology inventory survey at the beginning of each academic year 
as a requirement for students to enroll in classes. Collect data on residential use of cable television, 
broadband adoption and uses (including health, work, education, finance and civic engagement), 
and an inventory on personal electronic devices. The survey should be available in both Spanish 
and English languages. Inform low-income families about low-cost options available for high speed 
internet. Provide resources on purchasing computers and teach students how to maintain personal 
computers safely and securely. 

e Assessment: The data will be compiled and analyzed the by school district’s Director of Technology 
to identify significant gaps and barriers in technology access and use. This information can provide 
additional insight into the holistic IT landscape of student homes and inform future budgetary 


decisions that involve IT infrastructure investment. 


6 Conclusion 
Access to ICTs improves quality of life and empowers smaller communities driven by common identities, 
ideologies and interests. ICTs now have an impact on how migrants perceive, negotiate and interact in their 
environments. Without access to global information and services, migrant communities will continue to 
experience economic and social attrition when connectivity with a wider world remains absent. Technology 
integration has demonstrated that students learn quickly and with greater retention when learning with the 
aid of computers. 

Within Educational School District 105, administrators complete over 100 walks a month in 
classroom and computer labs to ensure high quality of instructional technology in each campus facility. 
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Additionally, compulsory technology-focused professional development programs allow for ESD 105 staff 
and teachers to stay up to date with the latest technology in educational instruction. Aside from school 
facilities, migrant students in the Yakima Valley have access to regional libraries and community centers. 
However, unlike school district computer labs, computing resources in these spaces are not as adequately 
furnished or updated with educationally relevant software. 

School districts with prominent migrant populations should consider increasing access to distance 
learning alternatives to mitigate educational disruptions as a result of migrant students missing school when 
their families move from one work site to another. Funding for ancillary migrant resources should be 
prioritized higher in Washington Migrant Education Program’s Service Delivery Plan because these 
auxiliary efforts directly impact the motivation of migrant parents to transition their children to pursue 
postsecondary education or employment. Piloting a technology indicators survey will provide migrant 
educators and administrators the missing information they need to complement technology in their future 
curriculum design and to inform long-term IT infrastructure investments. These recommendations have the 
potential to build more digitally inclusive learning communities for similarly underserved areas that are 
densely populated with millennial migrant workers. 
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Abstract 

This paper presents the results of a content analysis of Twitter’s organizational rhetoric. Focusing on the 
language generated by Twitter’s founders in interviews and on the language that Twitter uses to describe 
its service on the Twitter.com website, this analysis establishes how these messages describe and depict 
the temporality of tweets and the Twitter platform. This study finds that nearly all of the organizational 
rhetoric sampled depicts a real-time nature of the medium while descriptions regarding what happens to 
tweets in the long-term are almost entirely absent. This finding is presented in juxtaposition with the 
Library of Congress’s announcement of the acquisition of Twitter’s full archive of tweets in 2010. 
Following this announcement, many Twitter users professed not realizing tweets were being saved. In 
light of the results of analysis of Twitter’s organizational rhetoric and the Library of Congress comments, 
this paper discusses how Twitter’s organizational rhetoric may provide users with an incomplete picture 
of the temporality of the service and of the long-term storage of tweets. This paper concludes by discussing 
the potential implications for users’ abilities to self-direct and make informed choices about the use of 


the platform if this organizational rhetoric is taken uncritically. 
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1 Introduction 


In 2010, the Library of Congress announced that it had struck a deal with Twitter. In a blog post entitled, 
“How Tweet It Is!,” the Library declared that, “Every public tweet, ever, since Twitter’s inception in March 
2006, will be archived digitally at the Library of Congress” (Raymond, 2010, para. 2). Following the Library 
of Congress announcement, Dylan Casey, a Google product manager commented that, “Tweets and other 
short-form updates create a history of commentary that can provide valuable insights into what’s happened 
and how people have reacted” (Singel, 2010, para. 10). With more than 100 million users tweeting 55 million 
times a day (Huffington Post, 2010), Twitter's archive had become of important cultural and historical 


value. 


However, despite the potential value of a Library of Congress archive, some Twitter users were not 
pleased with the announcement. Comments on the Library of Congress’ blog indicate surprise and 
frustration regarding the seemingly newfound permanence of tweets. Here are three examples: 


So with no warning, every public tweet we’ve ever published is saved for all time? What the hell. 
That’s awful. (Commenter-in Raymond, 2010) 


I can see a lot of political aspirations dashed by people pulling out old Tweets. ’ve always thought 
of the service as quite banal and narcissistic, but I’ve had a Twitter account to provide feedback 
to a college and a couple of vendors. I think Pll close my account now. I don’t need to risk Tweeting 
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something hurtful or stupid that will be around for all recorded time. (Commenter-in Raymond, 
2010) 


Now future generations can bear witness to how utterly stupid and vain we were — 1. for creating 
this steaming mountain of pointless gibberings, and 2. for preserving it for posterity. LOC, you 
nimrods. (Commenter-in Raymond, 2011) 


Even news reports on the announcement underscored the apparent transition from a fleeting existence for 
tweets to a newly instilled sense of permanence. For example, Wired Magazine noted that “While the short 
form musings of a generation chronicled by Twitter might seem ephemeral, the Library of Congress wants 
to save them for posterity” (Singel, 2010, para. 1). 

As careful observers may know however, tweets have never been fleeting. Twitter has always 
maintained a database of the messages sent through its system that extends back to when the service was 
founded in 2006. The company is now simply sharing its archive with the Library of Congress. What the 
comments on the Library of Congress blog announcement highlight (at least anecdotally), is a disconnect 
between some users’ expectations for the life-span of tweets and how Twitter actually manages older tweets. 
But where could have this incorrect expectation come from? 

There are a number of different ways that we come to understand how a technology functions. 
Rogers (1995) suggests that we may understand technology through our sensory experiences with it, by 
watching others use it, and by consuming messages about the technology. Messages about technology serve 
as an important guide for our understanding of what a technology does, how that technology functions, and 
what that technology’s potential place in our lives might be. To say this more simply, “linguistic forms can 
have dramatic effects upon how an event or phenomenon is understood” (Gill, 2000, p. 174). How something 
is described can change or dramatically impact the way we understand it. Therefore, in order to trace how 
users’ understandings of the temporality of tweets and of Twitter are potentially being influenced, this 
paper examines the ways in which Twitter's business representatives have described the temporality of the 
service, exploring how these descriptions depict the Twitter platform and what happens to tweets. By 
examining these specific forms of discourse about Twitter, this paper traces content that may impact users’ 
understandings of the platform and expectations of the lifespan of tweets. 

Examining the way Twitter is described by its founders in juxtaposition with the Library of 
Congress commenters’ professed understandings of the permanence of tweets provides an initial inroad for 
exploring how users’ understandings of this platform are constructed and influenced, and for identifying the 
potential problems for users that may result from this influence. By exploring the institutions and spaces 
from which meaning and understanding of technology may be drawn, we can better understand the potential 
influence of rhetorics and discourses of technology in the Web 2.0 milieu. We can begin to grasp how this 
particular slice of discourse can potentially impact users’ abilities to understand and subsequently control 
information flows in the digital environment and how this discourse, if taken uncritically, could impact 
users’ abilities to self-direction with regards to their use of the technology. 


2 Review of the Relevant Literature 


Previous research has identified how Twitter was described by the popular press during its first few years 
of existence (Arceneaux & Weiss, 2010) and there has been some analysis of user’s opinions on the long- 
term storage of tweets (Marshall & Shipman, 2011). However, absent thus far in the academic literature 
are analyses of the ways Twitter talks about itself and how this language, if adopted and internalized 
uncritically, could impact users' understanding of the temporality of tweets. There is, however, a body of 
relevant literature that informs this line of inquiry and provides a justificatory basis for exploring this 
particular arrangement of users, discourse, and perceptions of technology. 
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When a technology is relatively new and open to a period of “interpretive flexibility” (Pinch & 
Bijker, 1984), the shaping of discourse regarding a technology becomes a means for guiding its future uses 
and facilitating closure (Wyatt, 2004). As Pfaffenberger (1992) notes, any new artifact “must be discursively 
regulated by surrounding it with symbolic media that mystify and therefore constitute the political aims [of 
the technology|” (p. 294). The discourse produced by the technology’s creators therefore serves as an 
important tool that can help guide individual understandings and uses, as well as helping to structure the 
technology’s initial interpretive flexibility. 

As Twitter diffused throughout society — and as the public (and potential user pool) became 
familiar with the service — individual understandings of technology took shape. The owners of Twitter 
helped guide this process (and still do to this day) by generating their own descriptions of what the 
technology does and how users can use it. This language and this discourse then took on the form of 
organizational rhetoric. 

Cheney and McMillan (1990) describe organizational rhetoric as a system of communication with 
a common purpose that involves the coordinated activities of two or more persons. The organization then, 
“emerges and functions rhetorically through the communicative practices of its members and stakeholders” 
(Cheney & McMillan, 1990, p. 101). Businesses, such as Twitter, can have specific arguments that manifest 
in part through messages made publicly by organizational leaders (such as CEOs, founders, and public 
relations representatives). In Twitter’s case, its founders, Jack Dorsey, Evan Williams, and Biz Stone have 
each produced messages in various media outlets that describe the technology of Twitter, what the 
technology does, and how one might use it. These messages function as organizational rhetoric as the 
founders represent Twitter and are an important object of study as the messages represent symbolic media 
meant to help guide interpretation of the technology. 

Gallant and Boone (2008) argue, “Internet sites are inherently rhetorical” (p. 185). As such, the 
messaging present on the Twitter website and structure of the Twitter website itself additionally function 
in a way meant to help guide individual interpretation of the technology. Of particular importance is the 
instructional language on Twitter.com that orients first-time users and visitors to the operation of the site, 
as this similarly serves as an argument regarding the temporal properties of Twitter and the permanence of 
tweets. Analyzing Twitter’s website, in addition to the content of messages about Twitter created by 
Twitter’s founders in popular media outlets, allows for reflection on how social understanding and knowledge 
of Twitter is partially shaped through discourse, and how this language may serve its speakers interests. 


3 Method 


3.1 Research Question 


The research question that this paper addresses is: How does Twitter’s organizational rhetoric address the 
temporality of the Twitter platform? 


3.2 Content Analysis 


This study relies on a content analysis as its mode of discovery. Content analysis is a research method that 
uses a set of procedures to make inferences from text about the message itself (Weber, 1990). This content 
analysis relies on the assumption that, within Twitter’s organizational rhetoric, there is an inherent 
argument about how Twitter should be conceptualized by a broader public just learning about the 
technology. The comments on the Library of Congress announcement point anecdotally to confusion over 
the permanence and temporality of the service. This analysis seeks out descriptive language within Twitter’s 
own organizational rhetoric that explains the temporal properties of Twitter and tweets in order to 
understand the how the technology was being described through this discourse. This analysis focuses on 
descriptions of what Twitter is, how Twitter operates, metaphors that compare Twitter to other 
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technologies, and any language that accounts for the temporality of the service of Twitter and tweets within 
Twitter’s organizational rhetoric. Comparisons to other older technologies within these messages are 
particularly important, as Liparitito (2003) writes, “[w]hen confronted with a truly new technology that 
had not been an option before, consumers must find some way to match the unexpected with previous 
experience,” (p. 56). Similarly, and perhaps more bluntly, Lakoff and Johnson (1980) remind us, 
“[m]etaphors create realities for us” (p. 156). 


3.3. Data Collection 


Twitter founders Evan Williams, Biz Stone and Jack Dorsey have been active in discussing their service in 
the media since Twitter’s founding in 2006. Individually and collectively, they have given interviews in a 
variety of news outlets, talk shows, and at a variety of technology conferences. At these locations, they have 
discussed topics regarding Twitter, what Twitter offers users and the world at large, and the history of 
Twitter. The language that this group uses to describe Twitter in interviews also inherently functions as an 
argument for how others might conceptualize and view the service. 

The interviews analyzed and considered in this study were located through searches on the founders’ 
names in video hosting sites such as YouTube and Google Video. In identifying salient interviews, 
preferential treatment was given to older interviews and interviews that occurred on major news outlets or 
talk shows. This method of selection should be considered as purposive sampling, but does represent a 
potential limit regarding generalizability of the findings. Eight interviews were considered as part of this 
analysis: a 2006 interview with Evan Williams, Biz Stone and Jack Dorsey on the technology interview 
program called “LunchMeet”, a 2009 interview with Biz Stone part of the You 2.0 Documentary Project, a 
2009 with Biz Stone on Comedy Central’s Colbert Report, a 2009 interview with Biz Stone and Jack Dorsey 
on ABC’s The View, a 2009 interview with Jack Dorsey on Agora News, a 2010 interview with Biz Stone 
on CNN’s Wolf Blitzer’s Newsroom, a 2011 interview with Biz Stone on PBS’s Newshour, and a 2011 
interview with Jack Dorsey on the Charlie Rose Show. After identification, interviews were then transcribed 
and coded by hand. 

In addition to the interviews, the Twitter website itself is home to numerous rhetorical messages 
that contain language meant to guide users’ sense-making process. This study approaches the rhetorical 
messages on Twitter’s homepage as they appeared to an individual who is using the site through a web- 
browser for the first time. This distinction is necessary as Twitter offers mobile versions of their site and as 
there are numerous applications for various mobile and cellular devices that also interface with Twitter. 
These other locations are areas to be explored in future work. The analysis this study undertakes includes 
the landing page for Twitter.com, the sign-up page, the “Home” page, the Terms of Service, and Twitter's 
Privacy Policy. 


3.4 Coding Schema 


The types of content within the interviews and within Twitter’s website that is of interest to this study are: 
the descriptions of what Twitter is, how Twitter operates, metaphors that compare Twitter to other 
technologies, and any language that accounts for the temporality of Twitter or tweets. Once identified, this 
language was then coded into one of three categories that emerged during the identification of salient 
content: language that suggests that Twitter maintains an archive of tweets, language that suggests Twitter 
does not maintain an archive of tweets, and language that focuses on the real-time nature of tweets, 


neglecting any description of how tweets are treated in the long-term. 
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4 Findings 


4.1 Interviews 


In the interviews analyzed here, Twitter’s founders never exactly say that tweets are kept indefinitely, nor 
do they ever exactly say that tweets are ephemeral. The long-term storage of tweets is not explicitly 
discussed. Instead, the most common descriptions of Twitter itself are messages that describe Twitter as a 
“real-time” media while neglecting any explanation of how tweets are treated in the long-term. This type 
of content is present exclusively in six of the eight interviews considered for the analysis. These types of 
descriptions predominantly focus on the immediacy of the technology. For example, in an interview with 
Wolf Blitzer on CNN, Biz Stone was asked summarize the real point of Twitter. He responded: 


Id say the real point of Twitter is to help people discover and share what it is that’s happening 
around them in their world. It really has become an information network that’s very focused on 
real-time [emphasis added]. (Blitzer, 2010) 


This style of response, with particular focus on the words such as “real-time” and “immediacy,” is present 
in other founders’ remarks as well. When asked what Twitter is best at in a 2009 interview with AgoraNews, 
Jack Dorsey responded: 


Well I think...I think what Twitter is best at is only that the sum of the people that use it. I think 
as a technology it brings a lot of immediacy to the conversation. It allows people to interact in real- 
time and it allows a great mass of people to interact and report from wherever they are and whatever 
they're doing. So I think that that really engages people in a way like never before so you can, you 
can be out witnessing something, you can be out helping someone, you can be at, you know, a hall 
of government and just talk about what you're seeing what you're experiencing and other people 
read that in real time and that may inspire them to act on their own [emphasis added]. (AgoraNews, 
2009) 


However, not all of the interviews contained these consistent descriptions or metaphors for what Twitter is 
and what Twitter is like. Two interviews contained metaphors that constitute conflicting messages regarding 
the temporality of Twitter. The first appeared in a 2006 interview with “QLunchMeet”. In this interview, 
when describing Twitter, the founders referred to Twitter being like “a chatroom” (Slutsky & Codel, 2006), 
which is a technology that may or may not have centralized messages storage. Seconds later, they also state 
that the service is “like LiveJournal” (Slutsky & Codel, 2006). In terms of metaphors, LiveJournal.com is a 
blogging/diary platform substantively different than a chat-room with quite differing message retention. 
LiveJournal maintains an accessible database of posts made to its servers whereas a chat-room may or may 
not. Given that this was an interview very early in Twitter’s existence as an organization, it is possible that 
Twitter had not standardized its messaging quite yet. In a much later interview, when Biz Stone and Evan 
Williams appeared on ABC’s morning talk show The View, the two contradict the statements made in the 
2006 interview. In explaining Twitter to The View host Whoopi Goldberg, Biz Stone states, “It’s really 
different than e-mail, chat rooms and all this stuff you might be used to” (Walters, 2009). This is also the 
first instance of a negative association being presented in the interviews. Here the audience is told that 
instead of Twitter being like other technologies, it is actually unlike these other technologies. 

In the same interview on The View, Stone uses language that suggests a level of ephemerality to 
the medium and reemphasizes its real-time nature. He states: 


If you ignore e-mail for a few days it just piles up. Social networks, the same thing, are you my 
friend, yes/no. That’s not what Twitter is.. Twitter is an information network, you go on and you 
say, ‘I want to follow this source of information, I want to follow CNN, I want to follow The View, 
Pd like to follow Ev [Evan Williams], Pd like to follow my mom, and I want to curate this 
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information in real-time and receive it in real-time because its meaningful to me, and in that way 
it’s very different’ [emphasis added]. (Walters, 2009) 


Here, we can see how the metaphor of messages piling up could conjure the image of an archive and a sense 
of longevity and — within this message — we are told that Twitter is not like this. This statement is then 
immediately followed with a renewed focus on the real-time nature of the platform. While Twitter’s founders 
are not explicitly claiming that Twitter is ephemeral, the content of this particular message does offer a 
potential misleading metaphor that may help could influence a user’s understandings of this technology. If 
taken uncritically, this problematic metaphor, in combination with a few conflicting messages, and in 
addition to the heavy focus on the “real-time” nature of the platform in Twitter’s organizational rhetoric, 
could play a role in the development of user expectations and understandings of the longevity of tweets 
that did not match what actually happens to tweets. 


4.2 Twitter.com 


The Twitter website itself is home to numerous messages in the form of organizational rhetoric that may 
guide users in the sense-making process. Almost all of these messages orient users exclusively towards the 
real-time nature of Twitter while neglecting any discussion of how tweets are stored in the long-term. The 
first page that a visitor to the Twitter website arrives at contains large text on the left hand side of the 
screen stating “Welcome to Twitter. Find out what’s happening, right now, with the people and 
organizations you care about [emphasis added]” (twitter.com, 2013a). Older versions of the website included 


similar language such as: 


Discover what’s happening right now, anywhere in the world. Twitter is a rich source of instant 
information. Stay updated. [Twitter’s landing page, 2010, emphasis added] (Social Media 
Performance Group, 2013) 


Follow your interests. Instant updates from your friends, industry experts, favorite celebrities, and 
what’s happening around the world [Twitter’s landing page, 2011, emphasis added]. (Social Media 
Performance Group, 2013) 


Across these statements, we can see how visitors are, and have been, oriented towards the real-time and 
instantaneous nature of Twitter as soon as the landing page loads. 

The sign-up page is the next page that a user who does not already have an account would 
encounter. On this page, a new user is asked for “Full Name”, “E-mail”, “Password”, and “User Name”. 
Underneath, text appears that states: 


By clicking the button [The button reads: Create my Account], you agree to the terms below: These 
Terms of Service ("Terms") govern your access to and use of the services, including our various 
websites, SMS, APIs, email notifications, applications, buttons, and widgets, (the "Services" or 
“Twitter”), and any information, text, graphics, photos or other materials uploaded, downloaded 
or appearing on the Services (collectively referred to as "Content"). (twitter.com, 2013b) 


Only when a user clicks on the box containing the Terms of Service (not a necessary or required step) does 
the box expand to the full eight print pages of text, thereby revealing the full user agreement (but not the 
privacy policy, though there is a link to the privacy policy on the page). Twitter’s Terms of Service and 
Privacy Policy are the documents that govern user access and use of the Twitter service. While anyone who 
has ever setup a Twitter account has agreed to these conditions, a 2011 survey found that, “Only 18 percent 
of social media users surveyed said that they read the terms and conditions for posting to the sites they 
use” (Dugan, 2011, para. 7). By agreeing to these conditions, “you consent to the collection and use (as set 
forth in the Privacy Policy) of this information [any information that you provide to Twitter]” (twitter.com, 
2013d, para. 6). 
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Despite their length, the Terms of Service and Privacy Policy never explicitly state that Twitter 
maintains a permanent record of tweets, nor does it state that tweets are ephemeral. Instead, the Terms of 
Service includes statements such as, “What you say on Twitter may be viewed all around the world instantly 
[emphasis added]” (twitter.com, 2013d, para. 3), and “By submitting, posting or displaying Content on or 
through the Services, you grant us a worldwide, non-exclusive, royalty-free license (with the right to 
sublicense) to use, copy, reproduce, process, adapt, modify, publish, transmit, display and distribute such 
Content in any and all media or distribution methods (now known or later developed)” (twitter.com-d, 
2013d, para. 12). In these quotes, there again appears language that invites users to consider the real-time 
nature of the medium. Absent from this is language that might present a user with messaging that describes 
the long-term storage of tweets. While this license may grant Twitter the legal right to archive tweets in 
perpetuity and share such an archive with the Library of Congress, there does not appear to be any 
predominance of language that would invite a reader to understand that this was happening. 

Once a user has completed a brief orientation that describes the process of how to follow users, 
they are taken to the primary Twitter interface. The Twitter “Home” interface itself has changed 
significantly since its original design in 2006. The question that appeared at the top of the screen in 2006, 
“What are you doing right now?” was eventually replaced by the question, “What’s happening?” and, as 
of the time of writing in 2013, has been replaced with the much more simple and less inquisitive “Compose 
a new Tweet...” (twitter.com, 2013c). A text input box allows users to enter a response, with a button next 
to it marked, “Tweet”. Clicking this button sends the message off into the world of Twitter. A message just 
sent shows up in a user’s “Timeline”, the area directly underneath the input box. The Timeline displays, 
chronologically, both the messages of the user and the messages that have been sent by individuals that a 
user follows. Located on the left hand side of the screen are information about who a user is following, who 
is follow that user, suggestions for more people to follow, and an area marked, “Trends”. Within the realm 
of this interface there are several inherent rhetorical messages about the way that users should experience 
the site and the historicity of messages. 

The historical prompts “What’s happening?” and “What are you doing right now?” invite a user 
to form a response tweet that is of the moment; less so “Compose new Tweet...” These are questions that 
Twitter seems to be asking of users (or perhaps, one’s followers are asking of the user). Regardless of 
attribution to a speaker however, the historical prompts orient a user towards the “real-time.” 

When the user enters a tweet, it is immediately populated within the chronological timeline on the 
user’s page. A small bit of text to the right of each “tweet” appears in the timeline that describes how long 
ago that message was posted. The twenty most recent tweets appear in the timeline as a default. Only when 
a user scrolls down further and further on the page do older messages appear. Despite the fact that these 
older messages appear, there is a technical limit on the number of “tweets” that can be accessed through 
the timeline: 3200 (Owens, 2011). A user can only “go back” 3200 tweets into their history before the site 
will load no more. Twitter, beginning in January of 2013, began rolling out a new feature that does allow 
users access to their own historical archive, but not that of others in their timeline, and it does not populate 
the messages through the timeline itself. However — and importantly when considering the implications for 
Twitter’s overall efforts at shaping user understandings of the technology — Twitter went seven years 
before implementing this feature. There are multiple readings that can be made of the Twitter timeline 
itself. 

The most recent tweets appear at top, and are therefore the first thing that a visitor sees. This 
seems to orient users towards the “real-time” nature of the medium by providing the most recent 
communications first. While some have compared Twitter to a diary, a paper diary confronts a viewer with 
the oldest messages first. A diary confronts a viewer with a history — a diachronic display of messages — 
by making all of the messages it contains visible. Twitter does not make the entire body of messages that 
a user has posted on the site immediately visible. Instead, they are hidden from sight, only viewable once a 
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user begins to scroll down through their timeline. In this way, users are oriented towards the “real-time.” 
However, if they chose to — and they must have chosen to in order to see it — users could access older 
messages by scrolling down. This created the possibility that users may draw the conclusion that their 
entire history of messages is in fact being archived. It creates the possibility that, despite the orientation 
towards the real-time, users may realize the longevity of their tweets. The fact that this particular section 
is even called, “Timeline” seems to favor this interpretation. Calling it a “Timeline” invites a user to imagine 
a sense of history. However, a user can only access 3200 messages within the Timeline. It would seem 
difficult to imagine that a user would ever scroll back through 3200 messages, as this would require over 
160 “next 20 messages” loading pages and would certainly necessitate a lot of patience. However, for those 
that did go that far back and then found they could load no more messages, what conclusion could they 
reasonably come to? It seems plausible that if a user could not access older messages beyond the 3200 tweet 
limit, that they might draw the conclusion that these tweets no longer existed. Here, the rhetoric of the 
technology itself is ambiguous, and an interpretation of ephemerality is possible if one concluded: once a 
certain number of tweets are populated, the old ones disappear. Of course, this is not the technical reality. 
Twitter itself maintains all tweets in any Timeline beyond the 3200 cut-off point. 

In summary, the messaging present on Twitter.com primarily orients users towards considering the 
real-time nature of the medium, instead of an understanding of tweets as ephemeral or permanent. It does 
not seem like much of a stretch to say that Twitter’s organizational rhetoric is focused on the here and 
now. Users are given an interface and tools on that interface that orient users towards the most recent, the 
current, and the trendy. Until the introduction of the individual archive retrieval function in early 2013, 
users were not given an interface, or tools, or a predominance of messaging that would orient them towards 
considering their entire history of tweets. Even today, the individual archive retrieval tool is buried at the 
bottom of the user’s settings page, and Twitter does not make mention of the Library of Congress archive 
in the text someone new to the service would encounter in the process of signing up. 


5 Discussion 


Twitter is not real-time. It is simulacra; an archive of messages that have been posted to a database some 
time ago. When we go to Twitter, we are not seeing the present, we are seeing the (sometimes not very 
distant) past. However, there is a critical reason that Twitter, as an organization, may have chosen to use 
this language of real-time and to focus on the immediacy of the medium: It is imperative for their business 
model. Twitter must create a self-fulfilling prophecy, discursively regulating the platform by surrounding it 
with symbolic media that constitute it as a place to go to gain access to real-time information (to borrow 
from Pfaffenberger). 

John Perry Barlow (1994) wrote, “Most information is like farm produce. Its quality degrades 
rapidly”. Yesterday’s news or gossip is not as valuable as today’s. Twitter’s value is highly dependent on 
the freshness of the content on its site. But in order for Twitter to offer “real-time,” they require a massive 
user-base that is constantly producing the real-time. To recruit this user-base, Twitter must offer a view of 
the world in 140-characters whose refresh rate is as up to date as possible, thereby offering a tantalizing 
source of information with a particular kind of value. Therefore, the success of Twitter as a business is 
partially dependent on their ability to position their platform through organizational rhetoric and discourse 
and to begin to shape the future uses of this technology through these tools. Through this messaging, 
Twitter’s founders speak of capturing the “real-time” and argue that they offer an exquisite and unique tap 
into what’s happening right now. Unfortunately, while simultaneously remaining silent about what happens 
to tweets in the long-term, users who overly rely on this discourse as an input into their understandings of 
the technology may be put at a disadvantage. 

The Library of Congress was not alone in announcing its 2010 partnership with Twitter. One of 
Twitter’s founders also made an announcement on Twitter’s official blog. In this announcement, Biz Stone 
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explained that tweets have “become part of significant global events around the world” (Stone, 2010, para. 
2), and that, “A tiny percentage of accounts are protected, but most of these tweets are created with the 
intent that they will be publicly available” (Stone, 2010, para. 2). Despite the assertion that Twitter users 
understood that what they were creating was “public,” Stone’s blog entry contains no discussion of whether 
or not users realized that tweets might ever become part of a nation’s permanent historical collection. The 
results of the content analysis suggests that — based purely on the sampled organizational rhetoric — it 
would be quite unusual to develop this expectation. While users are oriented towards the “real-time,” there 
is a dearth of language and messaging that suggests the longevity of tweets. Users are given an interface, 
tools, and text that directs them towards the most recent and generally away from considering what happens 
to tweets in the long term. Combined with the inability to view Timeline histories beyond a 3200 message 
threshold until the 2013 addition of the personal archive feature, perhaps the anecdotal confusion that some 
users had over the permanence of tweets makes sense. Perhaps it should not be surprising that some users 
would have expected that older tweets remained inaccessible, and perhaps while all of those messages were 
possibly created with the intent that they will be available publicly, does this mean that users understood 
how long that public life-span would be? 

In an interview discussing how journalists approach using Twitter, Andy Carvin, National Public 
Radio’s senior product manager for online communities, manages to capture the complex relationship 
between the real-time nature of Twitter and the long-term implications of an archive of individual tweets, 
stating: 


When I’m tweeting, I generally don’t think about whether I’m contributing to a historical record. 
There are definitely times when I feel the information I’m retweeting certainly is, but not really for 
my own tweets... Generally, when something big is going on, I’m in the zone and not thinking of 
much else except capturing what’s happening and figuring out what’s true. I definitely try to add 
context when it seems appropriate, but it’s really directed at real-time consumption. (Tenore, 2011, 
para. 4). 


Carvin’s sentiment — that he does not think about whether he’s contributing to a historical record in the 
moment — seems quite reasonable. Having every user contemplate the historical record every time they 
tweet would add a major speed bump of reflexivity. At 500 million tweets a day, perhaps it is much easier 
to simply not think about the digital trail that is left behind over time as we engage in communication in 
this medium. When the messaging present about Twitter focuses users on the real-time and is 
simultaneously silent, ambiguous, or even occasionally uses problematic metaphors that may influence 
understandings about the indefinite storage of tweets, users are dissuaded from engaging in temporal 
reflexivity. They are discouraged from considering both the historical record and the future of this data. 

This raises a number of potential concerns and questions about user agency and self-direction. First, 
as Yochai Benkler (2007) suggests, “A fundamental requirement of self-direction is the capacity to perceive 
the state of the world, to conceive of available options for action, to connect actions to consequences, to 
evaluate alternative outcomes, and to decide upon and pursue an action accordingly” (p.147). If self- 
direction is predicated upon an individual’s perception of the world, and the organizational rhetoric and 
messaging about Twitter helps shape this perception, and this language was misleading, ambiguous, or 
created any illusion of ephemerality, individuals could be impeded in their ability to set appropriate ends 
in their use of the technology. 

Second, as the Library of Congress intends on making this archive of Twitter available to 
researchers, it is important to ask questions about the secondary use of this data set. Just because a user 
may have sent a tweet, does this now indicate that they understood that tweets would be around for 
(virtually) forever, archived in the Library of Congress? Further research is needed to establish what Twitter 
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users’ understandings of the temporality of the platform actually are outside of the anecdotal comments on 
the Library of Congress archive announcement. 

Lastly, there are a number of interesting political economic issues at stake here that warrant further 
investigation. As Scholz (2008) points out, much of the labor on Web 2.0 sites is unpaid user labor on which 
the businesses that run these sites are reliant on in order to turn a profit. In the case of Twitter, language 
that describes the platform as anything other than “real-time” could prompt user reflexivity regarding the 
service. User reflexivity in turn could impede the timely production of tweets, and when Twitter’s business 
and profits depend on having users populate information into a system as quickly as possible in order to 
produce a simulacra of real-time, maybe, it is a lot easier not to give users a reason to slow-down and 
contemplate both the historical record and the future of this information. 


6 Conclusion 


Through a content-analysis of Twitter’s organizational rhetoric present in interviews and of the Twitter 
website itself, this paper has demonstrated that nearly all of the messaging about Twitter created by 
Twitter, particularly early in Twitter’s existence, focused on establishing the real-time nature of the medium 
while neglecting descriptions about what happens to tweets in the long-term. In light of this messaging, this 
paper has also highlighted a number of unsolved questions about assumptions about user’s intent for tweets, 
about user’s abilities to self-direct, and about the use of the Library of Congress archive. There is, however, 
more work that needs to be done to more thoroughly investigate Twitter users’ understandings of the 
medium and the longevity of tweets, to see how user understandings do or do not conflict with descriptions 
of medium, and to trace the influence that Twitter’s organizational rhetoric has had. This paper, however, 
is an important first-step that establishes how Twitter has been described through the rhetoric of its 
founders. 
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Abstract 

Sixty undergraduate students were paired up and participated in a usability study of the Library Explorer 
software on a Microsoft PixelSense Tabletop. Specific investigation into the impact of diversity attributes 
on collaboration style, collaboration quality, task performance, and participants’ perception revealed 
interesting patterns. Problem solving ideas were coded as suggestions. Gender composition and task 
performance were found to be significantly associated with the frequency of suggestions’ non-acceptance. 
Results show that during participants’ collaborative discovery on interactive tabletops, gender and racial 
diversities, while directly influenced their collaboration styles and processes, did not impact their team 
performance. Diversity attributes that significantly correlated with team effectiveness included native 
language diversity, differences in tabletop use experiences, usability ratings, and the frequency of 
suggestions not being accepted. Findings of the study not only enrich the understanding of the connection 
between team compositional diversity and collaboration styles, but also provide insights on how team 
members’ suggestion behaviors may help capture the dynamics of collaboration on interactive tabletops. 


Keywords: diversity attributes, collaboration styles, quality of collaboration process, problem solving suggestions, team 
performance, user perception 

Citation: Tang, R., & Quigley, E. (2014). The Effect of Undergraduate Library Users’ Dyadic Diversity Attributes on Interactive 
Tabletop Collaboration, Performance, and Perception. In iConference 2014 Proceedings (p. 88-109). doi:10.9776/14047 
Copyright: Copyright is held by the authors. 
Acknowledgements: This research study is funded by The Emily Hollowell Research Fund from Graduate School of Library and 
Information Science, Simmons College and by the Simmons College Faculty Fund for Research. Authors wish to thank Harvard 
Library and in particular Harvard Cabot Science Library for their support. Special thanks go to Christopher Erdmann, Lynne 
Schmelz, Michael Leach and Susan Berstler. Authors also wish to express their gratitude to Jeremy Guillette, Xiaoyan Song, and 


Joan Hagler, who were involved in data collection. Finally, we wish to thank all the volunteers and participants. We would also 
like to acknowledge Andries van Dam, Alex Hills, Mathew Ashby, and James Chin, and the Brown Graphics Group. Thanks go to 
Dan Gregson and Jeff Bernhard from Harvard University for their IT support. 

Contact: rong.tang@simmons.edu, equigley@iq.harvard.edu 


1 Introduction 


In April 2012, a private university located in the northeast region of the Unites States installed the Microsoft 
SUR40 tabletop (SUR40 henceforth) in three of its libraries, featuring the open source software “Library 
Explorer” (LE henceforth) developed by the Brown Graphics Group in collaboration with the university 
library. The LE software enables users to view collections of large format 2D artwork on a large touch 
screen table. LE also allows librarians to “prepare content and appropriate metadata and related assets to 
be viewed” by patrons (LADS User’s Guide, 2011, p.1). For an overview and video demonstration of features 
and functions of LE, readers are referred to follow the hyperlinks under item (Harvard Library UX, 2012- 
13) in the “References” section. Shortly after the installation of the SUR40 and LE, a multi-phased usability 
research study was conducted with Phase I involving 29 participants completing tasks individually and 
Phase II featuring two participants working collaboratively on tasks during a test session. This paper reports 
the results that pertain specifically to the impact of diversity attributes on collaboration characteristics and 


performance outcomes as participants work together to learn a new library tool on an interactive tabletop. 
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While the existing research on interactive tabletops addressed the topic of users’ collaboration styles 
when using tabletops in a particular context (e.g., Akerman, Puikkonen, Huuskonen, Virolainen, & Hakkila, 
2010; Marshall, Morris, Rogers, Kreitmayer, & Davies, 2011; Rick, Marshall, Yuill, 2011), seldom has any 
study investigated how library users collaborate on an interactive tabletop to explore library digital 
collections. Meanwhile, even though there is abundance of research on the relationship between team 
composition and team performance (e.g., Bell, 2007; Cannon-Bowers, Salas, & Converse, 1993; Fisher, Bell, 
Dierdorff, Belohlav, 2012; Klimoski & Mohammed, 1994; Rentsch & Klimoski, 2001; Webber & Donahue, 
2011), such investigation has not been thoroughly extended to the impact of dyadic diversity on 
collaboration process, task performance, and user perceptions. Furthermore, although there are a number 
of research studies on collaborative learning discussions and verbal participations, (e.g. Flecker et al., 2009; 
Shaer et al., 2011), there is almost no research that had correlated the number and state of team member’s 
suggestions (i.e., accepted or not accepted) with team diversity attributes, collaboration styles, and team 
performance. 

The present research has been guided by a number of theoretical frameworks. Using the operational 
definitions and models of (1) collaboration profiles (Shaer et al., 2011), (2) collaboration process scaling 
(Meier, Spada, & Rummel, 2007), (3) collaborative learning mechanisms framework (Flecker et al., 2009), 
and (4) multi-level diversity and team mental model similarity (Fisher et al., 2012; Harrison, Price, & Bell, 
1998; Miliken & Martins, 1996), this paper reports sets of results concerning the association between various 
diversity dimensions and collaboration style and team performance. In particular, dimensions of diversity 
include those that pertain to demographic (gender, race, and native language), academic (status and 
discipline), and use experience. Use experience consists of participants’ experience with tablet and smart 
phones, and their experience with the SUR40 or other types of interactive multi-touch tabletops. 

In this paper, the term “diversity” refers to “differences between individuals on any attribute that 
may lead to the perception that another person is different from self” (van Knippenberg, De Dreu, & Homan, 
2004, p.1008). Diversity research typically examines demographic attributes such as gender, age, 
race/ethnicity, and job-related diversity attributes such as organizational and group tenure, educational 
background and functional background (van Knippenberg et al., 2004; Williams & O’Reilly, 1998; Wolff, 
Ratner, Robinson, Oliffe, & Hall, 2010]. In this study, where dyadic composition is formed by experimental 
set up instead of based on a participant’s own choices, the term diversity is used to describe categorical 
heterogeneity that appear both at the surface level and at less visible levels (academic background, use 
experience, etc.), “as opposed to hierarchical differences (i.e., disparity) or differences along a continuum 
(i.e., separation)” (Fisher et al., 2012, p.831). 

Note also that in this paper, the scale for evaluating dimensions of collaborative process is frequently 
referred to as quality of collaboration. The phrase “collaboration style” is frequently used in place of 
“collaboration profile.” As was observed from the 30 research sessions, most of the dyads did not just have 
one single consistent collaboration profile throughout their sessions. Instead, as they progress through their 
tasks, participants often switched between two or more collaboration approaches. Consequently, the term 
“style” was deemed more appropriate to describe the dynamic nature of participants’ interaction and 
collaboration behavior. 


2 Literature Review 


There is a rich set of empirical research focusing on collaborative learning on interactive tabletops. In the 
field of social psychology, human factors, and management, there is also a wealth of literature on team 
composition, team mental model, and task performance. For the purpose of this paper, the review covers 
two relevant schools of empirical research: (1) collaborative styles/process and diversity levels, and (2) team 
composition and performance. 
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2.1 Collaborative Styles and Processes on Interactive Tabletops 


A great number of previous research studies on interactive tabletops explored users’ social behavior when 
using a tabletop as a group (e.g., Meier et al., 2007; Piper & Hollan, 2009; Rick et al., 2011; Shaer et al., 
2011). Papers by Shaer et al. [31] and Schneider et al. [30] on collaboration profiles are directly relevant to 
this study. Based on their research on the dyadic interaction using G-nome Surfer 2.0 on an interactive 
tabletop, Shaer and her team (Shaer et al., 2011) developed a typology for collaboration profiles. Recently, 
Schneider and his coauthors (Schneider et al., 2012) conducted a study comparing students’ learning of 
phylogenetic trees under two conditions: a multi-touch tabletop interface and a pen & paper activity. It 
was found that the tabletop implementation produced significantly higher collaboration activities such as 
dialogue management, information pooling, technical coordination, and individual task orientation. Table 
1 describes the four categories of dyadic collaboration using Shaer et al.’s (2011) taxonomy of collaboration 
profiles and Schneider et al.’s (2012) adopted description. 


Profile Description 

Turn-Taker Both users make and accept suggestions and observations 

Driver-Navigator Both users are engaged. The navigator contributes with suggestions and 
observations 

Driver-Passenger The driver is fully engaged, the passenger is not focused on the task 

Independent Users are absorbed in their own activity; minimal verbal communication 


Table 1: Collaboration Profiles and Their Brief Descriptions, Sources: (Shaer et al., 2011; Schneider et al., 
2012) 


Also of salience to studies of collaborative learning is the mechanics of collaboration outlined by Pinelle, 
Gutwin, and Greenberg (2003) as primitives for Collaborative Usability Analysis (CUA). Two broad 
categories of activities were described: communication and coordination. Communication consists of explicit 
communication and information gathering. Coordination involves shared access and transfer. Similarly, 
Meier, Spada and Rummel (2007) incorporate communication and coordination as two of the five dimensions 
of collaboration. The five process dimensions contain a total of nine attributes; See Table 2 for a full list of 
attributes and dimensions. Authors applied a five point scale ranging from -2 (very bad) to +2 (very good) 
for their assessments of 40 collaborating dyads. Meier and her colleagues concluded that the rating scheme 
they developed may be used as “generic assessment methods” for computer supported collaborative learning 
(Meier et al., 2007, p.81). 


Dimensions Attributes 


Communication 1) Sustaining mutual understanding 


N 


Dialog management 


Joint information processing 3) Information pooling 


aoo ç ® 


Coordination Task division 


D 


) 

) 

) 

) Reaching consensus 
) 

) Time management 
) 


7) Technical coordination 
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Interpersonal Relationship 8) Reciprocal interaction 


Motivation 9) Individual task orientation 


Table 2: Collaboration Process Dimensions, Source: (Meier et al., 2007) 


In addition to profiling and measuring collaborative learning, researchers also observed how participants 
collaborate through their verbal exchanges (Shaer et al., 2011) and making and accepting suggestions 
(Flecker et al., 2009). Shaer and her colleagues (2011) investigated the verbal and physical participation of 
24 dyads when they used G-nome Surfer software through a multi-touch tabletop and a multi-mouse 
computer GUI. Authors grouped the levels of verbal participation using the categories of insight, 
coordination, brief response, syntax, problem solving, and disengagement. In an earlier study, Fleck and her 
research team (2009) conducted a case study of 27 school children in groups of three when using a classroom 
seating plan software on a DiamondTouch tabletop. Based on the data, authors validated and extended 
their Collaborative Learning Mechanisms (CLM) framework. Table 3 covers only the verbal aspects of the 
CLM framework. 


Verbal Aspects of Collaboration Discussion 


Making and accepting suggestions 
Presentations e Making verbal suggestions and giving opinions 
Acceptances ° Listening to others’ suggestions and opinions 
° Asking for clarification of verbal or physical suggestions 
Negotiating 
Making, listening to and responding to each other’s suggestions 
Making alternative suggestions 
Disagree e Explanation of own ideas 
e Justification of own actions 
e Verbal blocking 
Maintaining joint attention and awareness 


Narrations e Inform others about your actions 


Table 3: Mechanisms for Coordinating Collaborative Discussion and Action, Source: (Flecker et al., 2009) 


2.2 Levels of Diversity, Team Process, and Team Performance 


Another relevant set of research on collaborative interaction is the work on group cognition and team mental 
models. Klimoski and Mohammed (1994) created a framework to explain the role of the team mental model 
in team performance. In their framework, an individual’s potential for performance is linked to team 
capacity, then team process, and then leads to team performance. Klimoski and Mohammed (1994) specify 
that team capacity is the result of team members’ individual potential interacting with two team level 
parameters: composition/size and resources. Team composition is defined as “the (gender/ skill/ age/ 
experience) make-up of the team” (p. 430). While team capacity may have the impact on an effective team 
process and thus high levels of team performance, the authors argue the factors of team mental model and 
leadership also contribute to the team process and thus lead to high performance. 
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In operational research and organizational theory, scholars proposed different taxonomies to 
describe team composition diversity. Miliken and Martins (1996) suggest that in addition to observable 
diversity such as race and gender, the underlying diversity such as differences in values, skills and knowledge 
and cohort membership holds different degree of impact on team cognitive outcomes, communication-related 
consequences and team performance. In the similar vein, Harrison et al. (Harrison et al., 1998; Harrison, 
price, Gavin, & Florey, 2002) outline two levels of diversity: surface level which is defined as differences in 
demographic characteristics among team members such as age, sex and race/ethnicity, and deep level which 
refers to differences in psychological characteristics such as attitudes, personalities and values. Yet another 
distinction was made by Pelled and her colleagues (Pelled, 1996; Pelled, Eisenhardt, & Xin, 1999; Simmons, 
Pelled, & Smith, 1999) which separate those diversity attributes that are highly job related such as 
education, functional background, and industry experience from those that have low job relatedness such 
as age, race and gender. Pelled proposed that low job related diversity attributes could lead to high affective 
conflict and thus negatively influence cognitive task performance. On the other hand, high job related 
diversity variables are seen as positively correlated with substantive conflict which is also positively 
associated with high group performance on cognitive tasks. 

Several researchers investigated composition diversity as antecedents of team member schema 
agreement (Rentsch & Klimoski, 2001), team decision making quality (Cannon-Bowers et al., 1993), and 
team mental model similarity (Fisher et al., 2012). Cannon-Bowers and her colleagues (1993) believe that 
the construct of “shared mental models” would explain the compatibility of team member’s expectations 
and thus lead to high quality of team decision making and team performance. They further argue that four 
types of mental models are useful for team effectiveness: (1) equipment model (i.e., technology), (2) task 
model, (3) team interaction model, and (4) team model. The fourth component, team model is concerned 
with teammates’ knowledge, skills, abilities, preferences and tendencies. 

Empirical results specifically relevant to gender and racial diversity produced inconsistent results. 
Racial diversity was found to be negatively associated with team mental model similarity (Fisher et al., 
2012) and team schema agreement (Rentsch & Klimoski, 2001), whereas gender was not found to have 
significant correlations. On the other hand, when studying the supervisors’ performance rating of their 
subordinates, Tui and O’Reilly (1989) found that “the subordinates in mixed-gender dyads were rated as 
performing more poorly and were liked less well than the subordinates in the same gender dyads” (p. 414). 
Authors found no support for the race effect. Yet, Kraiger and Ford’s (1985) meta- analysis research 
indicates a significant rate race effect in performance ratings: higher performance scores were given to people 
of raters’ own race. Meanwhile, empirical work on cross-country/culture collaboration (e.g., Binder, 2007; 
Xie, Song, & Stringfellow, 1998; Zagorsek, Jaklic & Stough, 2004) has also compared participants of different 
countries based on cultural dimensions such as power distance, uncertainty avoidance, individualism and 
collectivism, long term orientation, and more as per Hofstede (2001). 

Overall, a multitude of empirical constructs and frameworks has been developed to describe 
collaborative learning on interactive tabletops or the diversity factors which impact team process or 
performance. However, seldom has any empirical investigation been performed on library users’ collaborative 
discovery process using interactive tabletops. Furthermore, seldom have any studies linked team diversity 
attributes of different levels (surface versus deep level) with collaboration profiles, quality of collaboration 
processes, and team performance. In addition, the measurement of team members’ suggestion and its 
acceptance or non-acceptance status have never been operationalized as a variable to be associated with 
diversity attributes as well as team performance measures. 
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3 Research Questions 


Focusing exclusively on dyadic collaboration process, authors of this paper seek to explore the relationship 
among diversity attributes, collaboration style and process, and team performance. Relevant research 


questions are: 


How do various types of diversity impact on the dyadic collaboration style formation? 

2. How do various types of diversity impact on the quality of dyadic collaboration? 

3. How do various types of diversity impact on the number of suggestions made by a team member 
and whether the suggestions were accepted or not accepted? 

4. How do various types of diversity impact on participants’ ratings on system’s usability, usefulness, 
future use and their likelihood to recommend the tool? 

5. Do various types of diversity correlate with team performance both in terms of efficiency and 


effectiveness? 


4 Research Variables 


Relevant to this paper, a number of variables pertaining diversity and team process, performance, and user 
perceptions are considered. These variables are listed in Table 4. 


Categories Items 
Diversity Surface Level Academic Use Experience 
e Gender e = Discipline e Tablet experience 
e Race e §©Status/class e Tabletop 
e Native language experience 
Team Collaboration Collaboration Style Collaboration Quality Suggestions 
Process e — Turn Taker e Sustaining mutual e Suggestion accepted 
e Driver Navigator understanding e Suggestion not 
e Driver Passenger e Dialog management accepted 
e — Independent e Information pooling e Suggestion 
e Reaching consensus negotiated 
e Task division e Suggestion 
e Technical ignored 
coordination e Suggestion 
e Reciprocal interaction disagreed 
e Individual task 
orientation 
Team Performance Efficiency Effectiveness 
° Time on task e Percent of tasks completed successfully 
° Percent of tasks completed with ease 
e Percent of tasks completed with difficulty 
° Percent of tasks failed to complete 
Perception Rating ° Overall usability experience 
e Usefulness 


° Likelihood for future use 
° Likelihood to recommend 


Table 4: Research Variables Relevant to Diversity, Collaboration, Performance, and Perception 
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5 Method 


Thirty usability study sessions took place at the university’s science library from October to December 2012, 
in a small staff training room on the second floor of the library. The sessions were video recorded with two 
cameras, one connected with Morae software to capture the SUR40 screen as well as participants’ hand 
movements, the other linking to a laptop to record participants from another angle. 

Undergraduate students were recruited as the target participants. Participants were informed as 
they signed up for a session that they would be working in a two-person team. As a participant signed up, 
he/she was paired up with another person according to specified date/time preferences. An effort was made 
to maintain an equal number of three pairings: female and female, male and male, and female and male as 
the mixed team. In the end, there were 11 female teams, 11 mixed teams and 8 male teams. 

Each research session consisted of three parts: (1) a pre-session interview, (2) a usability test, and 
(3) a post-session interview. The pre-session interview asked for participants’ demographic information, 
their use of tablet and touch screen devices, their library use pattern, and their vision of the characteristics 
of an ideal multi-touch tabletop. The post-session interview inquired about participants’ overall usability 
experience with the SUR40 and LE, their favorite and least favorite features, whether they learned new 
things about the library, and whether their experience matched their imagined ideal tabletop. 

The actual usability test involved going through a scenario to perform eight tasks on LE. Each task 
consisted of three to four subtasks (see Appendix A for the full usability test scenario containing tasks and 
subtasks). Participants were encouraged to think-aloud while completing tasks with the moderator sitting 
next to them. The screen movements, the sound, and participants’ behavior were logged by Morae Recorder. 
The body language and sound were also captured from a different angle via Quicktime recording. Study 
sessions were run by research assistants, with the second author as a consistent moderator for all sessions 
and additional research assistants as note-takers and data loggers. On average, a research session lasted 34 
minutes. Participants were each awarded a $15 gift card. 


6 Data Processing, Coding and Analysis 


Data processing involved transcribing pre and post session interview data into a spreadsheet as the coding 
sheet. After the initial coding was done, further processing was carried out to record the team average 
scores. For instance, a team’s overall usability experience score was obtained as from the average of ratings 
by two persons in that team. 

For collaboration related variables, the collaboration profile typology (Shaer et al., 2011) was used 
to code each task for a dyadic session. The collaboration style for the overall session was also coded. The 
collaboration process scale (Meier et al., 2007) was used to assess the quality of collaboration using a five 
point scale ranging from -2 (very bad) to +2 (very good). The quality of collaboration was coded by session, 
not by individual tasks. Since participants were not given a time limit, the attribute of “time management” 
was not included in the coding, making for a total of eight attributes. 

For each session, any occurrence of suggestion was coded through the Morae marker feature where 
suggestions were grouped into the format of verbalized, gestural, and verbal and gestural combined based 
on the suggestor’s behavior. Based on the suggestee’s reaction, suggestions were coded into four types: 
suggestion accepted, suggestion negotiated, suggestion ignored and suggestion disagreed. Because of the 
limited frequency of the suggestions other than those accepted, in the analysis, suggestions “negotiated,” 
“ignored,” and “disagreed” were all grouped into the category of “suggestion not accepted.” 

The coding process involved two researchers (both authors and a research assistant alternated 
which two individuals performed the coding and evaluation) watching video, and deciding on the dyadic 
profile for individual tasks first, then deciding on an overall profile for that session. After the profile coding 
for a given session was done, researchers continued on to assess the quality of collaboration based on the 


session they just reviewed by using the five point scale. 
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A second round of coding was performed to check the reliability of the first round coding. In the 
second round, 44% of the whole data set was randomly sampled, and researchers coded the sampled data 
for collaboration profile and collaboration quality independently. The round two and round one intercoder 
agreement ratios, based on Cohen’s (1960) kappa statistic, were .63 for collaboration styles, .50 for 
collaboration quality, and .83 for suggestions. According to scholarly literature about the measurement of 
inter-rater agreement (e.g., Altman, 1991; Viera & Garrett, 2005), the kappa value in the range of from .81 
to .99 is considered as “almost perfect agreement,” from .61 to .80 as “substantial agreement,” and 
from .41-.60 “moderate agreement.” It appears that researchers are in almost total agreement in coding the 
occurrences of suggestions, a strong agreement in coding collaboration styles but only moderate agreement 
in giving collaboration quality scores. Note that when using a numeric rating of collaboration quality, it is 
more difficult to get the exact same coding score than when using semantically coded categories. Note also 
the first round of coding, which was used in the data analysis, was the result of two coders working together 
to reach consensus. Consequently, the coded data should be viewed as solid with the proper level of inter- 


coder agreement. 


7 ~~ Results 


In this section, a brief demographic description of the participants, a report of participants’ post-session 
perception of the interface, and various statistical results pertaining to the relationships among a number 
of diversity attributes, collaboration style, quality of collaboration process, team performance indicators, 
and participants’ perceptions will be presented. 


7.1 Participants 


Among the 60 participants, 33 were female and 27 were male. The racial makeup of the team was coded 
simply by two categories: same (n=12) and different (n=18). Among sixty participants, 52 were native 
English speakers while 8 were not. Consequently, both members of 22 teams were native English speakers, 
whereas 8 teams were composed by one native speaker and one non-native speaker. Fifty-nine participants 
were in the range between 18 and 23 years of age, whereas one participant was in the age range of 24 to 34. 
It was obvious there were not a lot of age differences among participants. Table 5 summarizes the 
demographic makeup of the 30 dyads. 


Team Diversity Attributes 


Gender Racial Composition Native Language 

Female & Male & Female Same Different Both Native One Native & One 
Female Male & Male English Speakers non-Native 

11 8 11 12 18 22 8 


Table 5: Demographic Makeup of the 30 Dyads 


Over 51% of the participants were freshmen, 23% sophomores, and 12% each for juniors and seniors. In 
terms of their academic major, although the majority specialized in sciences, biomedical sciences and 
engineering (38%), the social sciences (23%) and humanities (12%) also had good representations. Sixteen 
participants (27%) had not yet declared their majors. The age of all but one participant was in between 18 
and 23 years old. 

Participants were asked about their experiences in using tablet devices and smart phones. Over 
70% had used an iPhone and over 55% had used an iPad. Figure 1 illustrates their responses. The majority 
of the students who owned or used an iPhone had used it for about two years (29%), some used more than 
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two years (18%) and others used more than six month but less than a year (18%). In terms of their 
experience with the SUR40 or other types of multi-touch interactive tabletops, the majority of the 
participants had not used any type of interactive tabletops (72%), 14 participants (23%) used it at various 
library hosted events, and 3 participants (5%) used other types of interactive tabletops. 


iPhone 
iPad 
Blackberry 
Android 
Nook 
Kindle Fire 
Kindle 
None 
Nexus 7 
HP Tablet 


Google Devices 
iPod 
Itouch 


Figure 1: Participants' experiences with tablet devices and smart phones. 


7.2 Post-Session Perceptions 


Participants rated their overall usability experience after they completed eight pre-defined tasks. They also 
commented on a variety of things related to the use of the SUR40 and LE. When they were asked to rate 
certain aspects of their experience, a seven-point scale was consistently used, with “7” being the highest 
value and “1” being the lowest. The average overall usability experience rating was 4.4, usefulness rating 
4.1, likelihood to use LE in the future was 3.8, and likelihood to recommend the tool was 4.3. 


7.3 Collaboration Styles (Profiles) 


All four collaboration styles (or profiles as per Shaer et al., 2011) were observed in the 30 sessions. 
Collaboration styles were coded by task and then by session. When looking at task-based collaboration 
styles as well as overall collaboration profiles by session, the most frequently used style at both levels was 
“turn-taker,” followed by “driver passenger.” Note that out of 30 dyads, one pair failed to complete the 
task of “taking a snapshot” and subsequently had to skip the next task. Consequently, the total number of 
task-based collaboration style is 239 instead of 240 (30 session x 8 tasks). Table 6 provides counts of each 
style at both the task level and the session level. 


Turn Takers Driver-Passenger Driver-Navigator Independent 
(TT) (DP) (DN) (ID) 
By Task 154 (64%) 54 (23%) 18 (8%) 13 (5%) 
By Session 15 (50%) 11 (37%) 1 (3%) 3 (10%) 


Table 6: Frequency of Collaboration Style Observed By Task and By Session 
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7.4 Diversity Attributes and Collaboration Styles 


7.4.1 Surface Level Diversity and Collaboration Styles 


A Chi-square test based on the gender pattern of the team make-up and frequency of collaboration styles 
used for individual tasks revealed a significant dependence (y? (6, N = 239) = 17.97, p < .01) of collaboration 
style used and the kind of team make-up. Table 7 shows the frequency of collaboration styles and different 
pairings. Whereas the turn takers (TT) style had the largest representation in all three kinds of teams, the 
mixed groups had no occurrence of the “independent” (ID) style. The male groups had a larger portion of 
the “driver-navigator” (DN) style, although the “driver-passenger” (DP) style was used the second most 
frequently in all three kinds of teams. Meanwhile, there was no significant difference among the three 
gender-based team make-ups and quality scores of any of the eight attributes of collaboration process. 


Gender Diversity Collaboration Style 

TT DN DP ID 
Female & Female 58 (66%) 2 (2%) 21 (24%) 7 (8%) 
Male & Male 36 (56%) 10 (16%) 12 (19%) 6 (9%) 
Female & Male 60 (69%) 6 (7%) 21 (24%) 0 (0%) 


Table 7: Team gender make-up and collaboration style 


The 30 dyads were grouped into “same” and “different” to distinguish the racial diversity of the team. Even 
though both groups had the highest frequency in using the TT style (same 75%; different 57%), significant 
differences were found (y? (3, N = 239) = 16.08, p < .01) where ethnically diverse groups used more the 
DP style (29%) than homogenous racial group (14%); and they also used more the “independent” style 
(8%) than homogeneous racial groups (1%). Dyads with same racial origin used more the DN style (10%) 
than groups that were different (6%). Figure 2 displays the raw frequency counts. 

With regard to native language diversity, the frequency of collaboration style use is significantly 
dependent on whether the team had two native English speakers or mixed (x? (3, N = 239) = 15.47, p 
< .01). While the teams made up by one native and one non-native speaker had a higher frequency use of 
the ID (14%, compared to 2% of the both native speaker group) and DN styles (11%, compared to 6% of 
the both native speaker group), the teams made up of both native English speakers had a higher frequency 
of the TT (67%, compared to 57% of the mixed group) and the DP (24%, compared to 18% of the mixed 
group) styles. 


97 


iConference 2014 Rong Tang & Elizabeth Quigley 


CollaborationStyle 
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Figure 2: Racial diversity and collaboration style use by task. 


7.4.2 Academic Status/Interests and Use Experience Diversity and Collaboration Style 


The academic diversity aspect is measured through the differences in participants’ academic status and 
areas of interest. Nine teams were from the same status, while 21 were not. No significant dependency of 
collaboration style on class level diversity was found. 

Fields of study were coded according to academic domains of sciences, social sciences and humanities. 
Further coding of disciplinary diversity grouped participants into the category of “same” (both from the 
same academic domains of sciences, social sciences, arts and humanities), “mixed” (e.g., one from science, 
one from the humanities), and “both undeclared” (both participants had not yet declared their majors). 
There were 20 dyads that were from different domains, five from the same domain, and in the remaining 
five teams, neither participants had not declared their major. 

Disciplinary diversity has been shown to influence the adoption task-based collaboration styles (%7(6, 
N = 239) = 33.38, p < .01). While all three kinds of teams had the highest frequency in using the TT style, 
teams with members with both undeclared majors had the highest percent of usage (83%), whereas teams 
with members from the same domain of academic interest had the lowest frequency in using the TT style 
(45%). Dyads with mixed areas of interests had a high frequency of the DP style, whereas teams with the 
same academic interests applied the independent style as well as the DP style. Teams whose members were 
both undeclared with their majors had no independent style. 


Same Domain Mixed Domain Both Undeclared 
Count Percent Count Percent Count Percent 
TT 18 45% 103 65% 33 83% 
DN 4 10% 12 8% 2 5% 
DP 9 23% 40 25% 5 13% 
ID 9 23% 4 3% 0 0% 


Table 8: Disciplinary Diversity and Collaboration Styles 


Participants’ experience with interactive tabletops may be grouped into three categories: both had used the 
SUR40 or other types of tabletops prior to the study session; neither of them had any experience with 
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interactive tabletops; and one had experience and the other did not. The majority of the dyads (60%) had 
not used the technology, seven teams (23%) had mixed experience (i.e., one used and one did not use), and 
both members of five teams (17%) used interactive tabletops. 

Teams with different levels of tabletop use experience adopted various collaborative styles in 
significantly different frequencies (%? (6, N = 239) = 18.72, p < .01). While the teams with both members 
having used interactive tabletops before never applied the ID style, they had significantly higher frequency 
in using the DN style (20%). Dyads with neither member that previously used the interactive tabletops 
frequently used the DP style (28%), whereas the ID style was used most frequently by the teams with 
mixed use experiences. 

Participants’ post-session ratings of their overall experience of LE’s usability were coded by the 
paired discrepancy in their ratings. While the majority of the rating discrepancies was a one point difference 
(16 teams, 53%), members of 11 teams gave the same rating (37%). The remaining teams had 2, 2.5 or 4 
point differences in the rating. Correlation analysis revealed that such rating differences are negatively 
associated with the use of the TT style (r = -.57, p < .01) and positively associated with the use of the DP 
style (r = .67, p < .01). 


7.4.3 Suggestion Behavior and Collaboration Style 


Throughout the study sessions, participants were observed making suggestions to their partner when they 
encountered a difficult task. Out of all 30 sessions, 29 teams had occurrences of making suggestions. The 
total number of suggestions made is 77, out of which, 57 suggestions were accepted and 20 were not. Forty- 
three suggestions were made in verbal format, 17 were gestural only, and 18 were made with verbalization 
in combination with gestures. Thirteen pairs made 1 suggestion during their study sessions, whereas 7 
sessions had 2 suggestions, and 4 sessions had 6 suggestions. 

In the case when the suggestion was not accepted, there was a difference in opinion about the 
problem solving idea. In such a situation, the non-accepted suggestions may be viewed as an indicator of 
diversity in opinions or perspectives, as opposed to the notion of the “shared cognition” (Cannon-Bowers & 
Salas, 2001). The frequency of suggestion not accepted was found to be positively associated with the 
frequency of the pair using the “driver navigator” style (r = .54, p < .01) 


7.5 Diversity Attributes and the Quality of Collaboration Process 


Among the three surface level diversity variables, gender and native language diversity showed no effect on 
the ratings of collaboration quality. Significant differences were found between racial make-ups of the teams 
and a good number of collaboration process ratings, with the exception of the “task division” and “individual 
task orientation.” Teams with both members of the same race had higher ratings in terms of “Sustaining 
Mutual Understanding” (t(28) = 2.42, p < .05); “Dialogue Management” (t(28) = 2.95, p < .01); 
“Information Pooling” (t(28) = 3.45, p < .01); “Reaching Consensus” (¢(28) = 2.45, p < .05); “Technical 
Coordination” (¢(28) = 2.25, p < .05); and “Reciprocal Interaction” (t(28) = 2.50, p < .05). Figure 3 shows 
the differences in means of the teams of same racial make-up versus diverse make-up. 
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Racial Diversity and Average Collaboration Quality Rating 
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Figure 3: Racial diversity and collaboration quality rating. 


Beyond the surface level diversity, the discrepancies in team members’ overall usability rating were found 
between two racial compositions of the teams and several collaboration process ratings, with the exception 
of the ratings for “Sustaining Mutual Understanding” and “Individual Task Orientation.” The rating 
discrepancy was negatively correlated to “Dialogue Management” (r = -.40, p < .05); “Information Pooling” 
(r =- .43, p < .05); “Reaching Consensus” (r = -.430, p < .05); “Task Division” (r = -.38, p < .05); and 
“Reciprocal Interaction” (r = -.57, p < .01). 


7.6 Diversity Attributes and Suggestion Behavior 


Interesting patterns emerged when considering surface level diversity variables with dyadic 
suggestion behavior. Significant gender effects were observed both in terms of the total number of 
suggestions made (F(2, 27) = 3.71, p < .05) and suggestions not being accepted (F(2, 27) = 9.95, p < .01). 
While male only teams made on an average the highest number of suggestions, they were also on average 
the highest non-acceptance teams. On the contrary, female pairs made both the lowest number of 
suggestions and had the lowest number of non-acceptance frequency. The mixed gender group fell in between 
the male and female only teams. Figure 4 is the means plot for suggestions not accepted. The means plot 
for total number of suggestions shows a very similar pattern as the one in Figure 4. 

With regard to the format of suggestions, teams with mixed native speakers had significantly higher 
frequency in verbalizing their suggestions than pairs with both members as native English speakers (t(28) 
= 2.37, p < .05). 
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Figure 4. Gender diversity and number of suggestions not accepted. 


7.7 Diversity Attributes and Perception Rating 


Gender make-up had a significant influence on the recommendation rating (F(2, 27) = 3.92, p < .05). In 
contrast with the patterns in suggestion behavior, male teams gave a lower rating for recommendation than 
female teams. The mixed gender teams were once again in between male only and female only teams. Such 
an inverse pattern would make sense as male teams were found to be more critical than female dyads. 
Participants’ use experience with tablet devices was grouped into the categories of (1) both used 
iDevices (e.g., iPad, iPhone, iTouch, etc.), (2) one used iDevices and the other used other types of tablets, 
and (3) one used iDevices and the other had not yet used any tablet devices. A majority of the teams was 
those that both participants had used various Apple products (n = 22), followed by iDevices and other 
devices (n = 6), and iDevices and none (n = 2). In answering the question “Based on your experience today, 
on a scale of 1 to 7, with 7 being extremely useful, how useful do you think Library Explorer is to you in 
your learning and creative use of library resources?”, a significant difference was found among three tablet 
use experience groups in their usefulness rating (F(2, 27) = 4.10, p < .05). Teams with both members 
having used iDevices gave the highest usefulness rating, whereas the groups that had one member used 
iDevices and one never used any tablet gave the lowest usefulness rating. Figure 5 presents the means plot 


of the usefulness rating by groups with different use experiences. 
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Figure 5: Tablet use diversity and participants’ usefulness rating 


7.8 Diversity Attributes, Suggestions, and Team Performance 


As shown from Table 4, among the list of variables, the team performance is specified through the variables 


3 


of task time as an indicator of “efficiency,” and four “effectiveness” measures: percent of tasks completed 


successfully, percent of tasks completed with ease, with difficulty, or failed to complete. 


7.8.1 Diversity and Task Time 


Among the surface level diversity variables, the only variable that has significant impact on task completion 
time is the native language diversity (t(28) = -2.50, p < .05). On average teams with both native English 
speakers completed various subtasks significantly faster (M = 36.36, SD = 7.40) than teams with mixed 
native speakers (M = 48.02, SD = 18.61). 

The total number of suggestions correlates positively with the task time (r = .44, p < 0.05). The 
frequency of suggestions being accepted also correlates positively with task time (r = .45, p < 0.05). 


7.8.2 Diversity and Effectiveness 


Once again, among the surface level diversity variables, the only variable that has significant impact on 
task effectiveness is the native language diversity. Teams with both members as native English speakers 
completed various subtasks with significantly higher ratios of ease than teams with mixed native speakers 
(t(28) = 3.21, p < .01); they also had lower ratios of tasks failed to complete (t(28) = -2.27, p <.05), and 
higher ratios of tasks completed successfully (t(28) = 2.07, p < .05). 

The diversity in academic status holds an impact on percent of the tasks completed with difficulty 
(t(28) = -2.18, p < .05). Teams with both members from the same class level (i.e., freshmen, sophomore, 
junior, and senior) experienced a significantly lower percentage of task difficulty (M = 1.4%, SD = .03) 
than those teams with members of mixed class levels (M = 5.1%, SD = .05). 

Participants’ previous experience with an interactive tabletop also made a difference in their task 
success ratios (F(2, 27) = 5.00, p < .05). Teams with both members not using an interactive tabletop before 
the research session (M = 99%, SD = .02) and teams with both members who used the SUR4O or other 
types of tabletops (M = 98%, SD = .02) had significantly higher percentages of task success than teams 
that had mixed tabletop use experiences (M = 95%, SD = .06). Figure 6 shows the means plot of the 
success ratio and groups with different levels of experiences of using tabletops. This might suggest that 
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teams of the same tabletop use experience (both used or neither used) can formulate a shared mental model 
more efficiently, whereas those with mixed experiences would require longer time to develop a common 
mental model they both would accept and use for performing tasks. 


1.0000: 


0.9900 


0.9800" 


0.9700 


0.9600 


Mean of PercentSuccess 


0.9500; 


0.9400; 


T 
both used one used and one not both not used 
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Figure 6: Tabletop use diversity and percent of tasks completed successfully. 


The total number of suggestions made is positively associated with the percent of tasks failed to complete 
(r = .42, p < .05) and negatively associated with percent of action tasks completed successfully (r = - 
48, p < .01). The frequency of suggestions not being accepted is negatively correlated with percent of 
action-based tasks completed successfully (r = -.43, p < .05). 

The differences in dyadic ratings of overall usability were found to be negatively associated with 
the percent of tasks completed successfully (r = -.57, p < 0.01). 


8 Discussion 


A number of interesting patterns emerged from examining various diversity attributes and their impacts on 
dyadic collaboration styles, collaboration quality, problem-solving suggestions, task performance, and 
participants’ perceptions. With regard to research question 1, most of the diversity variables, with the 
exception of the diversity in academic status, were found to have significant variations in the use of 
collaboration styles. There are no consistent directions, but with the exception of native language diversity, 
the mixed groups in gender, race, and discipline applied the “driver passenger” (DP) style more frequently. 
Differences in participants’ usability ratings are also positively associated with the frequency of the DP 
style. Various measures of suggestions, including suggestion not accepted, are a good indicator for the 
“driver navigator” (DN) style. This seems to make sense, as one would assume that there would be more 
suggestions from the navigator, and the suggestions may be discussed, negotiated or disagreed between the 
driver and the navigator. 

Diversity attributes do not seem to have a strong connection with collaboration quality. While 
racial diversity and differences in participants’ usability rating were found to be both negatively associated 
with several dimensions of collaboration quality, gender, native language, discipline, status, use experience, 
and suggestions were not significantly linked with collaboration quality ratings. 

There were gender and native language effects on problem solving suggestions. The former is linked 
to the total number of suggestions made and suggestions being not accepted, and the latter is connected 
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with the format by which the suggestion was delivered. It is interesting that the mixed gender dyads was 
in between the male and female pairs with regard to the number of suggestions made and number of 
suggestions not being accepted. It seems that the mixed gender team can be a good balance for suggestions 
being made and being otherwise not accepted. 

Associated with the fact that male dyads offered more suggestions and debated on the suggestions, 
they also gave the lowest ratings for recommendation. Female dyads were least frequently to reject a 
suggestion, and they offered the highest rating for recommending the tool. Teams with both members prior 
use of Apple tablet products, offered the highest usefulness rating for the SUR40 and LE, whereas groups 
with mixed experience (one used and one not used) gave the lowest rating. 

None but one diversity variable was related to the task completion time, however, more underlying 
diversity variables such as differences in academic status, tabletop use experience, usability rating, and 
suggestions not accepted are associated with effectiveness measures such as success ratio, percent of tasks 
failed to complete or completed with difficulties. In most cases, the dissimilar groups (mixed class levels, 
mixed experiences in tabletop use, suggestion not accepted) appeared to be associated negatively with the 
percent of tasks completed successfully. For instance, in terms of tabletop use experience, two kinds of 
“same” use experience teams (i.e., both members never used or both members used) had significantly higher 
success rates than the teams where one member used tabletops before and one had not. Based on the theory 
and research on team schema agreement or mental model similarity, teams with mixed experiences would 
likely to have some level of difficulty to instantly form a shared mental model and thereby could experience 
a reduced success rate. 

Overall, results seem to confirm the “similarity attract” paradigm (Byrne, 1971), but while diversity 
attributes at all levels seem to influence the collaboration style, the significant diversity impact on team 
performance came mostly from the underlying attributes such as tabletop use experience, suggestions not 
accepted, and usability rating. It is understandable that the surface level diversity attributes would influence 
how teams interact and collaborate, but they do not directly associate with task performance outcome. It 
is the substantive underlying diversity attributes, be that experience, problem solving skills, and team 
members’ opinions that are the antecedents for similarity in team mental model, and therefore team 
cohesiveness. These diversity attributes have been viewed as highly task-related (Pelled, 1996; Pelled et al., 
1999; Simmons et al., 1999), and they would therefore determine the team performance. 

Native language diversity is an exception that exhibited direct associations with both team 
efficiency and effectiveness. In this particular study, the language variable might have served as an indicator 
of whether the communication would be carried out smoothly and effectively to complete tasks in a 
controlled setting. However, there was no significant difference between the two groups (both native 
speakers versus one native and one non-native speaker) on their collaboration quality ratings. 


9 Conclusions 


In presenting the results of diversity attributes and collaboration patterns of 30 dyads working with the LE 
interface on the SUR 40, some aspects of the previous empirical theories on team composition diversity and 
team mental model similarity were confirmed, while new questions arose that require further investigation 
and confirmation. A noticeable originality of this study is the integration of tabletop collaborative learning 
frameworks with theories of team composition diversity, team mental models, and team performance. 
Another innovative perspective that this paper introduces is the empirical construct of problem solving 
suggestions. The analysis of suggestion behavior produced significant insights that would lead to a greater 
understanding of the collaborative discovery process on interactive tabletops. 

Even though examining team composition diversity and collaboration behavior is not the only goal 
of the Phase II study, it is a valuable phenomenon that is worthy of substantial and advanced intellectual 
analysis. In another paper (Tang, Quigley, Guillette, & Erdmann, 2013), we reported the impact of 
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collaboration style on collaboration quality and team effectiveness, this paper focuses on the effect of dyadic 
diversity. One of the major limitations of the present study is that in both pre and post session interviews, 
no specific inquiries were made regarding participants’ past experience and how they feel about the quality 
of the collaboration when they used the SUR40 and LE. Nevertheless, further analysis on participants’ 
qualitative preference data, their verbal exchange, and their coordination activities as per Pinelle et al.’s 
(2003) model is currently on-going. Additional qualitative analysis on participants’ physical behavior such 
as their hand gesture types and their screen touch rates, their territoriality (i.e. spatial partitioning) and 
conflict as per Tang and his colleagues (Tang et al., 2010) is also in plan. Advanced qualitative analysis 
will help to reveal deeper layer insights of dyadic collaboration processes that pure quantitative results, 
such as the ones presented in this paper, are unable to. As more and more libraries adopt new devices for 
their users, the trend will be that more similar devices will be developed for supporting group learning and 
collaborative discovery. Understanding the team composition diversity and the multiplicity of collaboration 
factors and their interplay will enable us to better serve our users as collaborative learners. 
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13 Appendix A. Usability Test Scenario: SUR40 & Library Explorer 


For this study, you and a partner who is assigned to work with you in this research session will be interacting 
with a touch table called the Microsoft Surface (SUR40) and using software developed by the Harvard 
Library and Brown University. For the test scenario, there are no right or wrong answers. We are not 
testing your knowledge or skills. We are simply interested in learning how you interact with the technologies 
presented in the test. We are also interested in learning how you and your partner collaborate to figure out 


how to complete various tasks. 
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As an undergraduate student at Harvard University, you are familiar with the Cabot Science Library and 
have used the facility and collection in the past. You and your partner are both taking a course at Harvard 
University and your professor has asked the class to participate in a special two person team project. You 
and your partner (who is your teammate) are to browse selected images from the Harvard Library’s digital 
collections by using Cabot’s SUR40 and create a short report of your findings. 

NOTE: You will be performing 8 tasks, and each contains 1 to 4 questions. As you perform each task 
according to the instruction, please remember to speak out aloud of what you are thinking, and answer the 


questions for each task verbally. Any questions? 


TASK 1. You wish to find an application on the SUR40 table that will help you browse the digital 
collections of the Harvard Library. You turn the table on (or wake it from sleep mode) and find the icon 
menu screen. 


a. What do you do to get to the icon menu? 


b. Which application would you choose to browse the library’s digital collections? 


c. What tells you that you have found the right application? 


TASK 2. Launch the Library Explorer application. From now on, we call the application screen that 
you will be interacting with as the “interface.” 
a. What particular part of the interface would you use to search and filter images with? 


b. Select a category or date from filter menu. Now, list the interface options that are available to you, 
in different parts of the interface. 


Options: 


c. Select three different categories. [Make participants to try out different categories so as to 
add more images to the screen.] In the filtered result set of images (the thumbnails displayed in 
the center of the screen), what particular action would allow you to see a description of one of the 


images? 


What tells you that you have found the image description section of the 
interface? 


TASK 3. Your professor has instructed you to use Medieval Chart image for the project. Find the 
image using the category filter and select the image from the results. From the image description section, 
open the image in full screen. 

a. What action did you take to open the image in full screen? 


b. Explore and find out what options are available to you in the current (full screen) view. 


c. Now, zoom in and out of the image. What action did you take to zoom in and out? 


TASK 4. Learn more about the image by using hotspots. 
a. What do you think hotspots are? 
b. Turn the hotspots on and off. What did you use to do this? 
c. How do you view individual hotspots on the image? 
d. What are displayed in the hotspots? 
View and read all the hotspots in the medieval chart image. 
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TASK 5. Your professor has asked you to take a snapshot of a section of the medieval chart image for 
your project. 
a. Where would you go to find information on how to take a snapshot? 
b. Identify a section of the medieval chat image and take a snapshot of it. How did you take the 
snapshot? [Moderator asks each participant to try out of taking 


the snapshot] 
c. You decided that the snapshot is too dark and would like to discard it. How would you discard 


the image (remove it from the screen)? 


TASK 6. You discarded the snapshot that was too dark. Instead, you wish to take a brighter snapshot 
of the same section of the image and send it to yourself. 


a. Adjust the brightness of the image. Which part of the interface did you use? 
b. Identify the same section of the image you selected before and take a snapshot of it. 
Make a note of why you took the snapshot. What did you use to add the note? 


c. Add your email address and send the snapshot and note to yourself, for the class project. How did 
you do that? 
Note that the email feature has been disabled, but imagine that you have successfully sent yourself the 


email. 
d. Open one of the hotspots from the chart. Take a snapshot of a part of the hotspot item, make a note 
and send it to yourself via email for inclusion in your project. How did you do that? 


Note that the email feature has been disabled, but imagine that you have successfully sent yourself the email. 


TASK 7. Access associated media for the medieval chart image, find an appropriate media item and 
send another snapshot of it to yourself. 
a. What would you use to find associated images? 


b. Select an associated image. What do you think is the relationship between the associated image and 


the main image? 


c. Zoom in and out of the associate image. Are there any differences in the zooming functionality 
between the main image and the associated image? 


TASK 8. Close the Library Explorer application and return to the main menu screen for the SUR40. 
a. What would you do to get back to the main icon menu? 
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Abstract 

Reliance on volunteer participation for citizen science has become extremely popular. Cutting across 
disciplines, locations, and participation practices, hundreds of thousands of volunteers throughout the 
world are helping scientists accomplish tasks they could not otherwise perform. Although existing projects 
have demonstrated the value of involving volunteers in data collection, relatively few projects have been 
successful in maintaining volunteers’ continued involvement over long periods of time. Therefore, it is 
important to understand the temporal nature of volunteers’ motivations and their effect on participation 
practices, so that effective partnerships between volunteers and scientists can be established. This paper 
presents case studies of longitudinal participation practices in citizen science in three countries—the 
United States, India, and Costa Rica. The findings reveal a temporal process of participation, in which 
initial participation stems in most cases from self-directed motivations, such as personal interest. In 
contrast, long-term participation is more complex and includes both self-directed motivations and 
collaborative motivations. 
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1 Introduction 


Around the world, the number and impact of biodiversity- and ecology-related citizen science projects is 
greater than ever before. Hundreds, and sometimes thousands, of people of all ages, professions, occupations, 
and locations, take part in these endeavors. The projects themselves range from those that can be done at 
home or in the backyard, such as sorting photographs of animals in order to document migration habits or 
counting the number and species of birds feeding from a birdfeeder, to remote and more complex fieldwork, 
including field observations, specimen collection, and long-term monitoring (see www.citizenscience.org and 
www.scistarter.org for a list of citizen science projects in the United States). Projects also vary according 
to their scope and target audience and can range from families and young students engaged in specific, 
short-term, local projects (such as “bioblitzes,” which are compressed forms of biological surveys aimed at 
capturing a snapshot of current ecological conditions), to long-term involvement in continuous projects that 
encompass global phenomena (Bonney et al., 2009; Rotman et al., 2013; Wiggins & Crowston, 2011). 
Volunteer involvement in scientific projects is also supported by new advances in technology, namely 
Internet-based and mobile connectivity, which brings scientists, scientific research projects, and volunteers 


closer than ever before. 
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Volunteers and scientists derive different benefits from participation in scientific research. Volunteer 
motivations, especially, are complex, related to both individual and social differences, and they change over 
time (Boezeman & Ellemers, 2007; Clary et al., 1998; Locke, Ellis, & Smith, 2003). The main thrust of this 
study is to understand the underlying motivations affecting long-term participation of volunteers in citizen 
science projects. This is achieved by analyzing interview data from participants in citizen science projects 
in the United States, India, and Costa Rica. The research question that this paper addresses is: What are 
the motivations that affect volunteers’ initial and long-term participation in citizen science? 

The remainder of the paper will frame this research within previous contributions, introduce the 
methods used and the case selection, discuss findings from interviews with volunteers in three different 
countries, and explore differences between initial and long-term motivating factors in the three countries. 
The paper ends with a brief discussion of the implications of this study for future research in citizen science. 


2 Background 


Citizen science enables research based upon the work of volunteers, some of whom may be knowledgeable 
in the domain, yet who typically lack formal training (Bonney et al., 2009; Cooper et al., 2009). In addition 
to direct scientific collaboration, citizen science supports the “engagement of nonscientists in true decision- 
making about policy issues that have technical or scientific components” (Lewenstein, 2004), and can 
increase scientific literacy and interest (Miller-Rushing, Primack, & Bonney, 2012). 

Today, there is a growing reliance on volunteers’ contributions to science for various budgetary and 
practical reasons: scientists can no longer afford long excursions into the field, yet the potential for collecting 
data is greater than ever before, particularly if technology can be appropriately harnessed while still keeping 
humans in the loop. This deluge of data, coming from sensors, probes, observations, and computerized 
assessments, makes it difficult for even large teams of professional scientists to methodically collect and 
analyze the data without the help of volunteers. Over the past decade, citizen science has changed gradually 
and become more and more dependent on technology that reaches larger numbers of volunteers, often 
located remotely from professional scientists and from each other. To support the growing role of volunteers 
in scientific research, we must better understand what initially attracts volunteers and, perhaps most 
importantly, what motivates them to continue to participate for extended periods of time—an issue that 
has only recently begun to be explored (Rotman et al., 2013). 

Volunteers participate in collaborative activities for a wide variety of reasons at both individual 
and group levels. These general motivations include commitment to a larger cause, reputation gains, 
reciprocity, learning benefits, expression of self-efficacy, personal motivation types, and empathy (Batson, 
Ahmad, & Tsang, 2002; Lakhani & von Hippel, 2003; Preece & Shneiderman, 2009). Researchers have also 
examined how patterns of U.S. volunteerism change over a lifetime. Pearce (2003) reports a complex, cyclical 
pattern where volunteerism increases until age 18, decreases drastically in the early 20s, and rises again to 
reach a peak between 40 and 45. Bussell & Forbes (2003) have also studied volunteerism over time in the 
context of recruitment and retention cycles. However, while work within citizen science has touched on 
motivational concepts in different contexts (Nov, Arazy, & Anderson, 2011; Raddick et al., 2010), citizen 
science motivation has been studied in countries outside the United States to a lesser extent, with notable 
exceptions such as Bell et al. (2008). And while there is likely some overlap with temporal motivation in 
domains such as volunteerism (Bussell & Forbes, 2003), online social structures (Butler, 2001), and 
communities (Preece & Shneiderman, 2009), some factors unique to citizen science—such as the tendency 
of scientists to embrace volunteer contributions early on in a research project, but not at later stages (Kim, 
Robson, Zimmerman, Pierce, & Haber, 2011), and the unique role of expertise—suggest that domain-specific 
research is crucial to understanding motivations over time. 

Volunteers are people who give an asset such as time, resources, or attention freely and without 
the expectation of monetary or other reward (Dekker & Halman, 2003). Within the United States, many 
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formal volunteer opportunities fall within existing establishments such as local organizations and religious 
communities (Putnam, 2001). These communities provide an infrastructure able to utilize key factors, such 
as social support (Bussell & Forbes, 2003), that help sustain contribution over time. Without this 
infrastructure, it can be difficult for volunteers to move beyond brief or intermittent contributions (Penner, 
2004; Putnam, 2001). Researchers studying citizen science projects have identified certain types of projects, 
such as action projects, that are similarly rooted in place and thus interwoven with a community and its 
concerns (Wiggins & Crowston, 2011). 

The decision to volunteer is a factor influenced by individual differences such as gender, access to 
technology, age, income, family structure, level of education, and independence (Pearce, 1993; Terry, 
Harder, & Pracht, 2012). It is also a factor of the culture in which volunteers and projects are situated (e.g., 
relative emphasis on individualism vs. collectivism as described in Hofstede’s work (1980, 2001). These 
factors, along with evidence about cultural attitudes toward nature and ecology, were used to identify the 
three countries chosen for this study—the United States, India, and Costa Rica. 

This paper examines the motivations of citizen science volunteers in three countries and is based 
on a larger work (Rotman, 2013). While an in-depth comparison of these cultures is beyond the scope of 
this paper, an awareness that the motivations of citizen science volunteers are a factor of individual and 
group differences helps paint a holistic picture of motivation and an understanding that motivations change 


over time. 


3 Methods 


This research focuses on the motivational factors affecting short- and long-term participation practices of 
volunteers in ecology-related citizen science projects. Three independent cases were selected, based broadly 
on Yin’s description of a case as “investigat|ing] a contemporary phenomenon in-depth and within its real- 
life context, especially when the boundaries between phenomenon and context are not clearly evident” 
(2009, p. 18). The cases differ in the dominant demographics and in the professions, backgrounds, and 
education of their participants. The countries, which differ in their placement on various cultural dimensions 
proposed by Hofstede (1980, 2001), were chosen primarily because they offer different histories of citizen 
science, variation in the ways in which citizen science is practiced, and differing levels of formal and 
institutional support for citizen science projects (see Table 1). Sampling the different countries provided an 
opportunity to better understand the range of motivations and gain a more global perspective. This paper 
is not focused specifically on the differences across countries, though future work will consider that more 
directly. 


3.1 Three Exploratory Case Studies: The United States, India, and Costa Rica 


The website scistarter.com lists more than 400 citizen science projects in the United States alone. It is 
estimated that hundreds of thousands of people engage annually in these projects (National Science 
Foundation, 2012). Some ecology-related projects cut across local boundaries and are national in nature; 
but many are local, focusing on the immediate community or locality of volunteers. Most citizen science 
projects are supported through research programs in academic institutions, government agencies, and non- 
governmental organizations (NGOs), but a few are supported locally. 

India has numerous protected areas and natural sanctuaries that were developed in recent decades, 
but involvement of volunteers in science is relatively uncommon. The distinction between castes, and the 
differences in linguistic, religious, regional, social, and economic groups, have trickled down and made 
collaboration among the different groups difficult (Kannan, 1990). Countrywide collaborative scientific 
projects began to evolve in the mid-1990s with the People’s Biodiversity Register (PBR) as one of the first 
projects implemented across India (Gadgil, 2006). It aimed to support rural communities’ and individuals’ 
understanding of their ecological setting, document local ecological changes, and lead to local resource 
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management and countrywide documentation of these actions. Following PBR, the Indian government 
formed “Biodiversity Management Committees” that created biodiversity registers in consultation with the 
local people, which led the way to broader collaborative scientific projects that involved local “people’s 
knowledge” to enhance “official knowledge” (Gadgil, 2006). 

Costa Rica has the highest biodiversity density of any country in the world, with one of the highest 
proportions of protected land. Biodiversity is considered a national resource that can lead to economic 
prosperity: the monetary and economic value of conservation is emphasized by educational institutions and 
governmental organizations alike (Wallace, 1992). The country supports the use of private lands as natural 
preserves and environmental education centers through subsidies and direct payment (Langholz, Lassoie, & 
Schelhas, 2000). This deep commitment of both government and private organizations to conservation 
encourages citizen science projects focused on biodiversity. Funding for the projects comes from various 
governmental agencies, NGOs, and international and private organizations (Rotman, 2013). Key features 
of the countries chosen as case studies are listed in Table 1. 


3.2 Interviews and Analysis 


Interviews facilitate an understanding of the world from the participants’ perspective and aid in uncovering 
the meaning of people’s experiences by allowing for the development of rich descriptions and the integration 
of multiple points of view (Kvale, 1996, 2009). 

The selection of potential interviewees was based on “purposeful sampling” (Patton, 2002) in which 
a general framework for analysis provides an information-rich data set (Kozinets, 2002) as it cuts across 
participant variations in a way that portrays different demographics, interests, participation types, and 
engagement levels, but does not aim to create a representative sample. In addition, snowball sampling 
(Babbie, 2010; Biernacki & Waldorf, 1981) was used, in which interviewees pointed to others who could 
potentially provide rich information and/or were relevant to understanding pertinent issues of collaboration 
and motivation. Where snowballing sampling was used, the chain of referral was followed until “conceptual 
saturation” (Patton, 2002) was obtained. This resulted in 13 interviews in the United States, 22 in India 


and 9 in Costa Rica. Table 2 provides demographic information about the participants. 


Size and population 


History of collaborative Institutional support and 
Country (compared to other irae . : 
: scientific projects funding 
countries) 
United 3rd largest in size, 3rd ; Government, NGOs, 
i . Since the 19th century : tos Moraga 
States in population educational institutions 


7th largest in size, 2nd NGOs, few educational 


India i i Since the 1990s ee 
in population institutions 
ae Government, local and global 
Costa 127th largest in size, . H 
; : : Since 1970 NGOs, local communities, 
Rica 121st in populations 


educational institutions 


Table 1: A comparison of various properties of collaborative scientific projects in the United States, India, 
and Costa Rica 


In all cases, the interviews were semi-structured, based on a general list of predefined concepts and probes 
(Rubin & Rubin, 1995) used by the interviewer to maintain control of the direction of the interview. In 
some cases, the interview protocol was modified slightly to address cultural sensitivities. The core concepts 
of the interviews were iterated upon and continuously developed throughout the interviews. Important 
concepts that were introduced by participants in the first interviews were included in later interviews, and 
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in order to maintain similarity across populations, the same experiences and meanings were sought in the 
different cultures. 


Number of 


Country , . Roles Other demographic information 
interviews 
United 13 Professional scientists (3), 6 males, 7 females 
States volunteers (10) 
Indi 35 Professional scientists (6), 20 males, 2 females (more females were 
ndia 
volunteers (16) invited to participate but they declined) 
Costa 9 Professional scientists (2), 5 males, 4 females 
Rica volunteers (7) 


Table 2: Demographics of interview participants 


The interviews in the United States were conducted in April and May 2010; in India in December 2011; 
and in Costa Rica in August-November 2012. Due to the geographic distance, many of the interviews were 
conducted over Skype. Three of the Costa Rican interviews were conducted in Costa Rican Spanish and 
translated into English. 

The interviews were analyzed using grounded theory (Strauss & Corbin, 1990). Interviews from 
each of the three countries were coded separately; within each country, interviews were first coded 
independently of each other to reflect major concepts (e.g., “motivational factors,” “initiating collaboration,” 
“work patterns,” etc.), and then synthesized according to emergent themes (e.g., “cycle of collaboration”). 
Themes from all three countries were then grouped into a codebook, which was modified and refined 
throughout the coding process to reflect emergent concepts. Once the codebook was finalized, the interviews 
were re-evaluated and coded according to the identified themes. The themes were then compared in and 
across cases. To aid in the analysis process, notes, citations drawn from the interviews, and drawings and 
visualizations of the relationships between codes and themes were used. The interviews were analyzed until 
conceptual saturation was achieved and no new concepts were identified (Morse, 1991). Additional data, 
such as content retrieved from relevant mailing lists, images, and artifacts, were also collected and analyzed. 
The names and personal details of all interviewees were changed to protect their anonymity. 


4 Findings 
The themes that came up from the data addressed initial participation, long-term participation, and de- 
motivating factors, which are discussed below 


4.1 Initial Participation 


As the data unfolded, it became apparent that participation was highly dependent on personal interest, but 
there was also a gap between intent and actual participation. While most interviewees expressed a favorable 
attitude toward citizen science, they did not participate unless a project had a personal value or benefit for 
them. Four factors were found to encourage initial participation: 


4.1.1 Personal interests 


Jill’s comment typifies the role of personal interest as an initial motivator: “I think personal interest comes 
first. Personal interest and personal gain, with information.” (Jill, USA). Some volunteers (especially in the 
United States and, to some extent, in Costa Rica) were actively looking for opportunities to extend their 
knowledge through participation in citizen science projects. Others (mostly in the United States and Costa 
Rica) stumbled upon such projects by chance. 
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A slightly different case was that of an existing hobby which related to citizen science, like 
photography, art, travel, and sports (this was observed mostly in the United States). In these cases the 
main motivational factor was the ability to use the citizen science project as a platform to promote their 
hobbies as illustrated by Danny: “I started looking for a way to share pictures so that I could learn more 
about butterflies... and what started four or five years back is that some of the scientists said ‘OK ... please 
go here and there.’ I do that when I can but only if I can also get the pictures that I want or at the time 
that the light is good. ” (Danny, USA) 

Similarly, some volunteers found citizen science gave them an enjoyable opportunity to spend time 
with their friends and families and enhance their relationships through joint activities. In these cases, 
collaborative scientific projects had to be fun and engaging and speak directly to the interests and skill sets 
of potential volunteers. 


4.1.2 Self-promotion 


Self-promotion and furthering one’s own opportunities was also motivating, as these quotes illustrate: “..it 
will benefit me to increase my knowledge and ... for my experience for my future prospects or any other.” 
(Abhinav, India) “[My motivation is] gaining the experience and seeing what it is, maybe having something 
for my resume.” (Joe, USA) 


4.1.3 Self-efficacy 


The depth and level of involvement offered to volunteers within each project also became a strong motivator, 
speaking to volunteers’ sense of self-efficacy and feelings of equality and control over the scientific process. 
This was best exemplified in Costa Rica, where many citizen science projects offered volunteers control of 
the data they contributed and open access to their data and the data of others for secondary studies, as 
indicated by Laura’s comment: “A volunteer can participate at any level of research in my opinion. From 
a person who has no experience and needs to be trained to participate, to someone who has the same 
academic qualifications as the scientists and who just isn’t being paid.” (Laura, Costa Rica) 

However, most citizen science projects, specifically in India and to a lesser extent in the United 
States, did not actively encourage volunteers to participate in analysis or conduct secondary studies. Some 
even rejected the idea of volunteer involvement in the post-data collection altogether. 


4.1.4 Social responsibility 


Interestingly, collectivistic motivations as antecedents to participation surfaced at the initial stage of the 
projects only in one case — that of Costa Rica. Costa Rican collectivistic culture, supported by education 
and practice, emphasized the principles of social responsibility toward natural resources and drew many 
people to explore the opportunities citizen science offered, with the intention of joining these projects in 
order to advance the greater good of society as Jose’s comment illustrates: “I think if you visit Costa Rica 
and you talk to a cop, driver, or maybe a bus driver or people that work in a restaurant, they will make 
you a conversation about the topics of environment and their importance, there’s a true moral thing.” (Jose, 
Costa Rica) 

The role of the education system in the support of local institutions cannot be underestimated; but 
even more than that, the collectivistic motives were the product of national pride in nature and grassroots 
understandings of the role biodiversity has in maintaining and supporting the community. This introduces 
an alternative view of initial motivation to participate, one not directly related to the person volunteering, 
which was also associated with communities to which they belong. 


4.2 Long-term Participation 


Whereas the previous section detailed the first step toward participation, i.e., the move from a favorable 
view of citizen science to actual participation, here the focus is on continuous participation for extended 
periods of time. Unlike initial motivations, which focused mainly on one’s self and related to the benefits 
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one expected as a result of participation, long-term participation was motivated through a range of 
relationships. These relationships were negotiated between individual volunteers and those within and 
outside of the projects. Within-project relationships were initiated and cultivated between participants of 
the same project— predominantly among the volunteers themselves and between volunteers and scientists; 
external relationships were those created between volunteers and others who didn’t take part in citizen 
science projects, such as members of their communities, friends and families. Five factors were found to 
encourage long-term participation. 


4.2.1 Trust 


Scientists often saw volunteers as well-intentioned individuals with a limited ability to fulfill substantial 
scientific tasks that needed to be monitored as the comment from Madhu indicates: “... cross checking, cross 
study is always advisable.” (Madhu, India). While the scientists acknowledged the need for volunteers’ help 
in their work, they were hesitant to trust them with tasks that were more complex than simple data 
collection for fear of “data contamination,” low quality or complete lack of quality control, and potential 
deviance that would hinder their work. Volunteers, on the other hand, were shy of scientists, often seeing 
them as aloof and intimidating, speaking a particular jargon that was foreign to them. In quite a few cases, 
they did not even meet with the scientists throughout the project. Under these conditions, creating trust 
was difficult. However, some projects succeeded, and this success was often related to the governance 
structure of the project—the more centralized and pyramid-like the project was (where the leading scientists 
were removed from the volunteers), the less it resulted in trust between the groups, while relatively flat 
projects that enabled interaction between scientists and volunteers led to a slow build-up of personal 
relationships that facilitated trust. 


4.2.2 Setting common goals 


Setting the goals up front was used to create a common baseline of expectations among the various 
participants, and particularly between scientists and volunteers, as a scientist, Antonio, pointed out: 
“Communication must be constant and clear. A scientist has to be well-prepared to speak the language of 
citizens in order to clearly transmit their project and to inspire interest in people.” (Antonio, Costa Rica) 
Potentially contentious issues, such as roles, responsibilities, expected outcomes, and standards, were easier 
to address when they were openly communicated and discussed, or at least set out in a formal manner by 
the project’s leaders. Periodic discussion of these goals, which included volunteers as partners (or at the 
very least alerted them to the existence of such goals) helped in facilitating a positive rapport that 
maintained volunteers’ sense of competency. Routine messages about the project’s status, goals, and 
procedures helped remind volunteers of upcoming events or the continuity of the project, which was useful 
to those who were not deeply involved in it, and encouraged their participation for longer periods of time. 


4.2.3. Acknowledgement and attribution 


While acknowledgement could take various forms, and the view of what constituted sufficient 
acknowledgment varied greatly, a minimal level of recognition was essential for facilitating long-lasting 
participation, as Suzan’s comment illustrates : “Just a name and this X and that Y was contributed by this 
or that person. Something simple... is like a big thing for a normal person, this kind of thing make it very 
personal thing, and that way we encourage all to do it more ...” (Suzan, USA) 

The data revealed several aspects of acknowledgement that were either independent or interrelated, 
depending on the specific project and its settings. For example, some projects in the United States offered 
structured modes of acknowledgement that were open to all participants (periodic meetings in which 
volunteers’ work was showcased, or singling out individual volunteers for their contributions). Other projects 
offered lab meetings or meetings in the field, in which active volunteers and scientists interacted. In both 
cases these were pre-planned events that were meant to bring volunteers closer to the leaders of the projects 
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and highlight the work that they do. Most volunteers were not particular about the form acknowledgement 
took, as long as some was made, and it was made public. However the more “scientifically valid” the 
acknowledgement, the more it was appreciated. In other cases, mostly in the United Sates and Costa Rica, 
acknowledgement was provisory and impromptu, and came up though chance meetings among project 
participants. 

Like acknowledgement, attribution could be given in many ways—from a general acknowledgement 
that the data was obtained through collaboration with volunteers (without specifically naming volunteers 
or volunteer groups), to individual credit given to specific contributors. This was especially important where 
the data was used for outside publications (e.g., journal and conference papers, books, and online 
publications). Volunteers reported finding out that they were not acknowledged in publications which 
disappointed them. This was observed across all types of projects, and in all three cases. 


4.2.4 Mentorship 


As with the other themes related to within-project relationships, education and mentorship was based on 
several separate but interrelated concepts: training, closeness, and empowerment. As Oscar notes, “I get 
the sense that a lot of people do recognize our motivation to do citizen science because of the educational 
aspect.” (Oscar, Costa Rica) 

Many of the volunteers who joined citizen science projects in order to advance their scientific 
understanding and sense of self-efficacy actively sought an ongoing relationship with scientists. In some 
cases this translated into mentorship of various forms: from close contact between scientists and volunteers 
to ensure that the research was done correctly, to close personal relationships between scientists and 
volunteers. Many volunteers appreciated every opportunity they were given to meet with scientists and 
were willing to give up time and resources (e.g., pay for travel) to accomplish that. However, not many 
senior scientists were interested in engaging with volunteers, unlike junior scientists, who saw great value 
in mentorship activities (perhaps because they were close enough to the apprenticeship process required of 
beginning scientists). 

Another form of mentorship came from the need to train volunteers: some projects offered or 
required initial or repeated training in order for volunteers to actively participate. Training varied according 
to the specific project needs, and could be as short as a few hours or as long as several days. Further, it 
could be free or require payment; and it could be done online (birdsong recognition audio tracks in the 
United States and Costa Rica) or in the field (scat and track identification outings in India and Costa Rica). 
In all cases, volunteers were appreciative of the opportunity to extend their knowledge and competencies. 
Although training of volunteers could offer scientists numerous advantages, including a higher level of data 
quality and deeper commitment among volunteers, not many embraced this opportunity to include training 
in their research protocol. 


4.2.5 External relationships 


Most volunteers did not become engaged in citizen science to create change but rather, due to personal 
interests. However, through their participation they became exposed to the effects citizen science can have 
on their immediate environment and beyond it, and for some volunteers, this became a major cause. In 
turn, this cause motivated them to extend their participation outside the project, as Linda aspired: “[I] 
want to be kind of a liaison between the scientific field ... and the common person who has the questions 
and doesn’t know how to ask.” (Linda, USA) 

The shift from self-related motivation to a collectivistic one was not trivial. Volunteers have fewer 
avenues to extend their knowledge to others, and their status is not as highly regarded as that of professional 
scientists. Yet, in many cases they saw their role as mediators between local communities and scientists. 

Education as a motivational factor was especially salient where volunteers encountered remote 
communities whose exposure to conservation-related education was lacking. Beyond awareness, education 
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was seen as a tool to empower the local population and enable it to combine ecologically minded and 


sustainable practices with its economic and social needs. 


4.2.6 De-motivating Factors 


As described in the previous sections, favorable inclination toward citizen science has to be complemented 
by various motivational factors to drive volunteers to participate in a project. At the same time, de- 
motivating factors also affect participation, and particularly long-term participation, as Chris comments: 
“Initially everybody’s enthusiastic and with time participation level keeps dropping down. It’s a very small 
percentage that continues giving information.” (Chris, USA) 

Attrition rates among volunteers in citizen science projects were discussed in all three cases, and 
were estimated to range between 80 to 95 percent. This could be due to several issues: the lack of positive 
motivational factors mentioned above, or alternatively, the existence of de-motivating factors. De- 
motivating factors typically spoke to internal negotiations between the demands of the project and the 
volunteer. Constraints involving time and problems associated with technology were the most prominent 


de-motivating factors. 


4.2.7 Time 


The following quote from Apurva sums up the time dilemma that some volunteers experience: “It depends 
on how much time I have to contribute to this project. The best thing is where I can just log in a few 
seconds or minutes the information that I want to pass on and I’m very happy to do that, but if I spend 
about hour or two to even send a particular record and all the details, maybe I want to take a rain check...[I 
want to spend] as little time as I can. [If] it’s going to impinge on my own work time, that’s something I 
don’t want to do.” (Apurva, India) 

Interest, enjoyment, challenge or other initial or continuous motivations were often not enough to 
overcome excessive time demands. Some volunteers complained that scientists had no appreciation of their 
time, and demanded that they engage in overly complex and time-intensive tasks. This was a common 
theme across all cases. While some volunteers appreciated intensive projects that made them feel more 
committed to the scientific goals, most volunteers balked at the thought of spending too much time (a 
subjective term that could stretch from a morning every week to continuous immersion in the field) on a 
given project. Similarly, projects that required extensive travel to remote areas (especially in India) were 
seen less favorably than local projects that could be interlaced with volunteers’ routines. 


4.2.8 Technology 


Projects that were (or could potentially be) made easy through the use of technology, but failed to deliver 
on that aspect frustrated volunteers and discouraged them, as Nina pointed out: “A lot of the schools I 
worked with were like one-room schoolhouses, maybe they had a computer, but probably they didn’t. They 
probably didn’t have an Internet connection even if they had a computer, so that was a big challenge.” 
(Nina, Costa Rica). 

This problem was particularly apparent in India and Costa Rica, where the technological 
infrastructure is poor in some rural areas, and is somewhat limited even in urban areas (this was especially 
relevant to mobile and web connectivity—one interviewee reported that between 60 and 90% of the Indian 
population does not have Internet connectivity or literacy). Even in the so-called technologically advanced 
United States, interviewees reported accessibility and usability problems. 

Problems involving lack of technology or inadequate or poorly designed technology frustrated 
volunteers and made them disenchanted with the projects. Projects that took into account the technological 
barriers and understood the local infrastructural limitations and made the relevant adjustments to enable 


participation and task completion were the ones whose volunteers were engaged for longer periods of time. 
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5 Discussion 


The findings reported above suggest that short-term and long-term participation tend to stem from different 
motivations, although there is some overlap. Without some initial self-directed motivation, such as a strong 
personal interest, participation will not happen, but without a broader motivation that goes beyond the 
self, such as commitment to conservation, long-term participation will not occur. 

Long-term participation is affected by myriad aspects, ranging from project-specific ones (e.g., type 
of project and local arrangements) to external relationships that extend beyond the project and affect 
individuals and communities outside it (e.g., national policies and culture). At the same time, de-motivating 
factors, such as poor communication or inadequate technical infrastructure, affect participation negatively 
and may cause attrition throughout the project lifecycle. By taking these findings into account and exploring 
how they relate to citizen science projects in different cultures, project managers and technology designers 
may gain insights about how to encourage long-term volunteer participation. How this is achieved will be 
influenced by the cultural setting in which the project is embedded, the type of project, the volunteer 
population, and how it relates to the scientists managing the project. 

Although some citizen science projects lend themselves well to singular contribution (e.g., single- 
day bioblitzes), most projects, such as those dealing with conservation, investigation, or education are long 
term and necessitate the ongoing involvement of volunteers (Wiggins & Crowston, 2011). However, long- 
term engagement is tricky to achieve, as was evident in all three cases. Facilitating long-term participation 
was highly dependent on the existence of—or at the very least, an awareness of— initial self-directed 
motivations such as personal interests, self-promotion, self-efficacy, and social responsibility. Where these 
were present, long-term motivations that reflected within- and across- group relationships became relevant. 
The most salient of those were the relationships between volunteers and scientists, and between volunteers 
and their communities. Projects that did not support or actively facilitate these relationships suffered from 
high attrition. Paying attention to factors that motivate participants’ involvement is of particular 
importance when one considers that they are volunteering their time to a larger endeavor (see Terranova, 
2000 for a critique of free labor in the digital economy). 

The two most significant de-motivating factors mentioned by interviewees as affecting long-term 
participation were time commitments and technology availability and usability, which were often highly 
intertwined. According to many interviewees, their need for training and feedback was not given much 
attention; instead, technological tools were thought of as "cover-all" solutions. In all three cases, pen and 
paper often proved not only to be more effective than highly complex computerized systems, but were also 
critical where no communications infrastructure actually existed. The gap between expectation and actuality 
in terms of technology was prevalent not only in the developing countries (India and Costa Rica), but also 
in the United States, where many volunteers found complex online reporting systems too burdensome and 
taxing to learn or use, and preferred simple interfaces or offline reporting tools. 

Projects that did not offer straightforward communication suffered from higher attrition rates and 
lower long-term engagement rates. Where volunteers could easily contribute data, and also retrieve it or 
follow the path of use of the data they contribute—and specifically when they could see its broader impact 
on scientific advancements and their own communities—they were motivated toward deeper engagement 
for longer periods of time, and more complex missions. Projects that enabled this interaction, and also 
emphasized the human perspective (e.g., communication, feedback, training, etc.), got an even more positive 
response. Feedback, for example, is crucial, but so is the way it is delivered. Most projects in all three 
countries studied suffered in that aspect. 

Table 3 summarizes the motivating factors that affect short- and long-term participation within 


the context of each of the cases that were studied. 
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6 Conclusion 

Many citizen science projects struggle with recruitment and attrition issues. Long-term volunteer 
participation is key to the success of such projects, but few projects succeed in supporting it well. This 
research suggests that approaches to engage citizen scientists must recognize the different reasons volunteers 
join versus the reasons they continue participating, as well as the role of cultural differences. Initial 
motivation for participation stems from self-related themes, in which volunteers are 


Potential participants Countries 


Related concepts 


INITIAL PARTICIPATION 


Personal 


interest 


Self- 


promotion 


Self-efficacy 


Social 


responsibility 


Enjoyment, interest, ancillary 
hobbies, leisure, interest in 


nature 


Reputation building, social 
advancement, future 


employment 


Affecting scientific work, 
belonging to the scientific 


community 


Conservation, pride, national 


and local dependency 


LONG-TERM PARTICIPATION 


Within-project relationships 


Common 


goals 


Acknowledge 


ment 


Mentorship 


Data quality, skills, value, 
time, leadership roles 


Communication, updates, 


structured protocols 


Recognition, attribution, value 


Training, closeness, 


empowerment 


External relationships 


Education 
and outreach 


Mediation, empowerment, local 


populations, knowledge 


Individuals with ample time to spare 
or a very specific interest in nature; 
families, all ages 

advance 


Individuals wanting to 


themselves (e.g., students, 
adults) 


young 


Educated individuals; relatively older 
adults 
Individuals affected by the local 


culture and education system; 


relatively young adults 


Experienced volunteers looking for 
close relationships with scientists 


Volunteers looking to deepen their 


relationships with scientists 


All volunteers 


Volunteers who wanted to become 


deeply involved in the project 


Long-standing volunteers who 
interact with locals 
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United 
States, India, 
Costa Rica 


United 
States, Costa 
Rica 


Costa Rica 


United 
States, Costa 
Rica 


Costa Rica 


United 
States, India, 
Costa Rica 


United 
States, India 


United 
States, India, 
Costa Rica 
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Related concepts Potential participants Countries 


United 
States, Costa 
Rica 


Policy and Accountability, government, Long-standing volunteers who 


activism institutions, community interact with locals 


Table 3: Breakdown of initial and continuous motivations according to the thematic concepts, and their 
manifestation in each of the three cases 


inclined to participate in projects that address their interests and offer them self-advancement and 
enjoyment. This correlates well with the existing literature that discusses initial motivation in this context 
(Bussell & Forbes, 2003; Butler, 2001; Preece & Shneiderman, 2009; Rotman et al., 2012). At a later stage, 
the motivational process becomes more complex and includes both self-related motivations and collaborative 
motivations that include within-project relationships and external relationships: a project has to show some 
value outside the actual tasks volunteers undertake in order to be deemed important enough to warrant 
continuous participation. 

The findings from the case studies discussed in this paper suggest several areas for future research. 
The role of feedback in motivating participation is recognized as important in many types of volunteerism 
(e.g., Zhu, Zhang, He, Kraut, & Kittur, 2013), including citizen science (He et al., 2013), and so is 
gamification (Bowser et al., 2012; Crowston & Prestopnik, 2013). The issues summarized in Table 3 above 
also offer suggestions for future research. In addition, although it is tempting to view citizen science projects 
as an opportunity for scaling up data collection at a relatively low cost, specifically in biodiversity and 
ecology research, this study shows that in three different countries, with diverse projects and volunteers, 
human interaction is a strong motivational factor. Following promising beginnings (e.g., Wiggins, 2013), 
considerable additional research is needed to understand how to scale up using technology while still 
ensuring human-to-human interaction both on- and offline, particularly between scientists and volunteers. 
In addition, future research is also needed to ensure appropriate checks on data quality, data standards, 
and policies across the world. 

The complexity of factors affecting long-term motivation and participation practices in each of the 
three countries indicates the need to tailor the design and implementation of each citizen science project 
according to the specifics of its purpose, location, available infrastructure, participation practices, and the 
expectations of potential volunteers, with attention to cultural context and sensitivities and realistic use of 
technology. This is an important area of research for citizen science and indeed, for any qualitative research 
requiring extended engagement by participants. 
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Genre is an important feature for organizing and accessing video games. However, current descriptors of 
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1 Introduction 


As video games increase in popularity, users expect efficient and intelligent access to them, similar to their 
access to other media. Game designers, manufacturers, scholars, educators, players and parents of young 
gamers all need meaningful ways of finding, accessing, and interpreting video games. As a first step in 
providing robust access to video games for diverse stakeholders, we need to understand the information 
provided to users through current video game access. 

Previous literature identifies genre as one of the most important features for accessing video games 
(Winget, 2011). As part of a larger research effort intending to improve access to video games, this paper 
explores the following two research questions: 


I. What are the different types of information that are represented in the genre labels that are 
currently used in available game organization systems? 

II. What are the facets and foci that can systematically describe the different types of information 
currently embedded in video game genre labels? 


In order to understand genre access offered by current systems, the authors identified multiple information 
dimensions represented in video game genres through facet analysis. Facet analysis is the process of 
examining a subject field and dividing it into fundamental categories, each of which represents an essential 
characteristic of division of the subject field (Spiteri, 1997). In this paper, we present a faceted classification 
scheme for video game genre based on our analysis of hundreds of pre-existing genre labels collected from 
existing video game organization systems. We provide definitions and explanations for each facet as well as 
examples of foci (i.e., indexing terms) along with a discussion on issues and challenges in representing video 


game genres. 
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2 Background and Related Work 


2.1 Research Question 


Currently available video game organization systems stem from two sources: the field of library and 
information science (LIS); and commercial systems on the Web, such as game sales or review websites 
(industry and fan-based). Both sources illuminate problems in helping users access video games. 

Non-book materials in libraries often end up described by form rather than content (Leigh, 2002). 
Items are organized and accessed according to physical format (e.g., VHS, DVD, cassette.) rather than 
grouped conceptually (such as collocating book and movie versions of Pride & Prejudice). Shoehorning non- 
book objects into a bibliographic description creates sub-optimal descriptions (Hagler, 1980), making it hard 
for people to find what they seek. Indexers also face challenges in describing games with bibliographic 
standards. For example, video games do not come with title pages, so rules stipulating transcription of 
information from title pages are unusable for video games. 

Other bibliographic models attempt to address this problem, such as Functional Requirements for 
Bibliographic Records (FRBR) (IFLA, 2009), but fundamental problems arise when applying these ideas to 
video games (McDonough et al., 2010). Descriptions based on the context of an object, such as a user’s 
reaction (e.g., mood), or similarity-based relationships (i.e., similar games)--which can be significant in the 
context of video games--are not represented in FRBR (Lee, 2010). Despite a focus on improving particular 
user tasks, FRBR is limited because it is derived solely from descriptions of information objects rather than 
on studies of users’ desired descriptive information. 

Library of Congress Subject Headings, designed to describe all materials held by libraries, contains 
only 219 headings (out of about 337,000) for describing video games mostly by name (e.g., Halo, Legend of 
Zelda). Consequently, many notable series are missing (e.g., Final Fantasy, Mass Effect) and these subject 
headings cannot be used for collocating similar games outside of a particular series. In addition, there are 
only five genre headings for video games: Computer adventure games, Computer baseball games, Computer 
flight games, Computer war games, and Computer word games. Such genre headings are limited at best, 
and hamper both searchability and browsability. 

A small number of LIS studies on metadata for video games exist (e.g., McDonough et al., 2010; 
Winget, 2011), but they tend to focus on older games due to an interest in preservation. These studies, 
however, consider game information from a data- or creator-centric point of view, rather than that of an 
end user. 

Alternatively, the web contains massive information about video games, scattered across many 
sources. Such a wealth of information, however, creates a poverty of certainty in determining authority and 
trustworthiness. Websites like Amazon.com, Mobygames, GameSpot, etc. are generally geared toward 
purchasing decisions and provide mostly basic descriptive elements like title, platform, genre, release date, 
and publisher. Websites like Wikipedia provide large amounts of descriptive information, but it is 
unsubstantiated, unstructured, and cumbersome to navigate. As a result, users leverage multiple sources to 
find and cross-check information across sites. 

These limitations of current organization systems motivated us to explore innovative ways to 
provide subject access to video games beyond basic descriptive elements -- information that can better 
inform users about the content or “aboutness” of the game. Doing so can assist future systems to better 


collocate similar games and make more intelligent recommendations. 


2.2 Video Game Genres 


In the interdisciplinary field of Game Studies, video game genres are simultaneously well-understood (for 
example, something like “SW:TOR is my favorite MMORPG”) and completely opaque (“It’s almost like a 
mix of Call Of Duty, Bejeweled, and Kirby’s Epic Yarn — but different!”). Such confusion may stem from 
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debates about the nature of video games. Narratologists argue games are texts with narrative structures 
and devices (like films), while ludologists argue that games are interactive experiences focused on gameplay 
and game mechanics. The complexity of video game genres suggests games are both. 

One of the earliest scholars to tackle game genre, Wolf (2001) modeled his system after the Library 
of Congress Moving Imagery Genre-Form Guide. He created 42 categories of games based largely on 
gameplay and interactivity (e.g., Abstract, Gambling, Racing, etc.). He deliberately excluded other elements, 
like mood or theme, as his system was intended to be used alongside an imagery- or style-based system 
(such as film genres). However, Wolf’s system is commonly critiqued for over-reliance on early-era examples 
(e.g., Space Invaders, Frogger, etc.) to build definitions, and for failing to accommodate modern genres, 
such as MMORPG or First Person Shooter (Clearwater, 2011; Whalen, 2004). 

King and Krzywinska (2002) describe a 4-tiered hierarchy which emphasizes interactivity rather 
than narrative: Platform referred to the gaming hardware; Genre referred to “broad categories such as 
‘action-adventure’, ‘driving’, or ‘strategy’,” (p.26); Mode referred to players’ experiences of the gameworld; 
and Milieu referred to “location and atmospheric or stylistic conventions” (p. 27). Whalen (2004) argues 
that this hierarchy fails to create a common language by ignoring game websites, and that, concurring with 
Clearwater (2011), these terms describe game elements occurring simultaneously, rather than hierarchically. 
Whalen instead suggests that most games can be divided into three categories: Massive games that are 
networked (thereby enabling massive numbers of players); mobile games designed for smaller screens and 
shorter play times; and real games “requir[ing] players to physically relocate themselves as an act of playing 
the game” (p.301). Whalen’s terms challenge the notion of genre, forcing new consideration of the 
constitutive elements of many games. 

Apperley (2006) proposed a detailed view on four common terms describing video game genres. 
Simulation games mimic physical world activities — but only to the extent that such mimesis does not 
interfere with entertainment. Strategy games require collecting, processing, interpreting, and accessing 
information via the game interface. Action games rely on the performativity of the player. Role-playing 
games are marked by changes in and valuations of players’ avatar characteristics (e.g., changes in level, 
power, armor, etc.). These definitions offer a critical view of genre often missing from larger discussions 
about the relationships among genres and players. 

Elverdam and Aarseth (2007) present their typology as an iteration on an earlier version (see 
Aarseth, Smedstad & Sunnanå, 2003). Their goal is to provide a tool enabling game designers to 
communicate with academics, game journalists, and players. The revised typology presents eight 
metacategories (e.g., Player Relation, External Time). Each metacategory has two to three unique 
dimensions (e.g., Teleology, Mutability, Synchronicity). Each dimension has two to three elements (e.g., 
Mimetic/Arbitrary, Finite/Infinite). The typology can then be used to compare games to find similarities 
and differences. The authors highlight the importance of “a knowledge base of classified games that is 
accessible to a broader field of researchers and developers” (p.20), which supports our goal. 

None of the systems reviewed above offer sufficient tools for categorization. Like the library-, 
industry-, and fan-based systems described earlier, game studies scholars find many dimensions of 
information embedded in video game genre descriptions. While some demonstrate attempts to tease out 
these different dimensions, most of these authors rely heavily on literary genre theory or film genre theory, 
revealing the narratological bias of these early works. Only Elverdam & Aarseth (2007) crafted a system 
based solely on games themselves. However, even their typology suffers from the challenge of complexity, 
and is best viewed as a ‘meta-tool’ to begin thinking about classification. 

Classification theory seems to be alien in game studies, so we bring a fresh approach to a long- 
standing problem in this area. As game studies shifts towards a more ludological perspective rather than a 
strictly narratological view, discussions of genre face an impasse. We believe our work can provide forward 


momentum. 
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3 Study Design 


3.1 Method 


We employed the method of facet analysis to tease out the different types of information that are represented 
in current video genre labels. Facet analysis is based upon two processes (Ranganathan, 1967; Spiteri, 1997): 


a) Analysis, whereby a subject field is divided into fundamental categories, each of which represents 
an essential characteristic of division of the subject field; 

b) Synthesis, whereby individual concepts from these categories may be combined to express compound 
subjects (Spiteri, 1997 p.21) 


Broughton (2008) asserts that “facet analysis is the only means of organizing the concepts in a subject 
domain that has a logical and intellectual basis (p.193).” Many systems are built after facet analysis has 
been conducted, but the analysis method itself is rarely discussed in detail. 

Broughton and Slavic (2007) outline their application of facet analysis for the creation of a faceted 
classification for humanities resources. They looked at extant schemes and pulled terms from them to name 
and organize the facets to be used in their system. Using Ranganathan’s fundamental categories as a starting 
point, they then modified them substantially to reflect the particularities of the humanities (e.g., abstract 
concepts, philosophical concepts, etc.). Gnoli and Hong (2006) discuss their application of facet analysis in 
their description of developing the Integrative Level Classification. Building on the fundamental categories 
of the Classification Research Group, they refined the categories on the basis of increasing complexity. This 
experimental work involved the classification of small corpuses of documents to test their system and to 
undertake research into appropriate user interfaces. 

Each of these studies shows that working from an existing controlled vocabulary or set of indexing 
terms is a useful approach when undertaking a facet analysis. By employing facet analysis, we attempted 
to create a conceptual map of a subject field: video game genres. Our process of facet analysis is summarized 
in Figure 1. 


3.1.1 Process of Developing Facets and Foci: Analysis 


For the Analysis part, we started by conducting a domain analysis of how genre labels are currently being 
used in the video game community. This process consisted of two parts: 1) literature review on video game 
genres (described above), and 2) collection of empirical genre data, i.e., the actual genre labels used in 
existing video game organization systems as well as game-related literature. 

We ended up with 804 instances of genre labels from multiple game-related websites and online 
directories/ encyclopedias (e.g., Allgame, Gamefags, Gamespot, Mobygames, IGN, Giantbomb, dmoz, 
ranker, Wikipedia, Amazon) as well as previous literature related to game genres (e.g., Apperley, 2006; 
Djaouti et al., 2007; Djaouti et al., 2008; King, Delfabbro & Griffiths, 2010; Foster & Misha, 2011; Wolf, 
2001). After eliminating duplicates, areas of conceptual overlap, and labels that were not applicable (e.g., 
labels describing interactive media such as word processors or image editing software), we used card sorting 
to organize and elicit categories describing different types of genre labels. Card sorting can be used as an 
exploratory technique as part of the piloting work without any preliminary elicitation using other techniques 
(Rugg & McGeorge, 2002). Here, it enabled us to create a conceptual framework to organize hundreds of 
genre labels. The terms were printed on paper strips and organized into homogenous, mutually exclusive 
categories representing specific characteristics of division of video game genres. This information was later 
digitized into a spreadsheet where labels found on new sources were continually added. 

After identifying major categories, we identified and named the facets to reasonably describe the 
specific characteristics of division (e.g., Gameplay, Artistic style, Theme, Mood/Affect). The final list of 
facets was chosen based on the seven guiding principles for selecting facets by Spiteri (1998): 
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e Differentiation: Facets should distinguish clearly among the component parts. 

e Relevance: Facets should be chosen for their relevance to the purpose, subject, and scope of the 
classification 

e Ascertainability: Facets should represent characteristics of division that can be measured. 

e Permanence: Facets should represent permanent qualities of the item being divided. 

e Homogeneity: Facets must represent only one characteristic of division. 

e Mutual exclusivity: Two facets cannot overlap. 

e Fundamental categories: Categories should be derived based on the nature of the subject being 
classified. 


Next, we clearly defined each of these facets. 


Collect currently used This critical component differentiates our 
genre labels scheme from many current game genres, as 
5 : a major problem during our analysis was 
2 Categorize genre labels re 
Ps into major groups that genre labels lacked definition. Then we 
3 organized the foci for each facet based on 
E Select proper name for the genre labels collected. The term “foci” 
E each group (facet) ; i i 
a ý is commonly used to refer to indexing terms 


| De in facet analysis (Spiteri, 1998). Then we 


evaluated each term based on potential end- 


: user warrant, considering the term’s 
Organize the genre terms 


(foci) for each facet popularity and potential user familiarity. 


We sought domain expertise from the 


Evaluate the foci for creators of SIMM (Seattle Interactive Media 
each facet 


No Museum) to make this judgment. 


3.1.2 Process of Developing Facets and 


Term 


Revise term 
adequate? 


Foci: Synthesis 


In the Synthesis phase, we reduced the size 


ry 
o 
A 
2. 
a 
eo 
< 
D 
pA 
© 

5 
z 
© 
5 
- 


and complexity of the foci in the genre 


Define foci as necessary scheme. By combining separate indexing 


terms, it is possible to represent complex 


Establish the final list of and compound subjects without the need for 
facets and foci 


enumerating all those concepts. Like Gnoli 
and Hong’s (2006) test, we described a 
variety of sample games representing 


Fi 1: Steps Involved in Facet Analysi 
ee BEN pe E E oe diverse gameplays and platforms to identify 


any problems with the scheme and remedy 
them by adding, deleting, or modifying the indexing terms. Through this iterative process, we continued to 
evaluate and refine the indexing terms. 

This term control process also involved evaluating the specificity of the terms, controlling 
homographs, maintaining term consistency and word forms, semantic factoring, and so on (Aitchison, 
Gilchrist & Bawden, 1997). Afterwards, we defined the foci in order to clearly convey the meanings of the 
terms and thus established the final set of facets and foci. 


3.2 Limitations 


We acknowledge the inevitable “incompleteness” of the scheme. There will always be some game-related 
websites excluded from our sampling, and certain types of games not included in the sample games used to 
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evaluate the terms. Additionally, even domain experts and enthusiasts are challenged by understanding the 
meanings of all the indexing terms (e.g., What is the difference between “Shmup” and “Light gun” games?; 
What does “Exergaming” mean?) as well as the domain itself (e.g., Do “Meditation” games really exist?; 
How has “RPG (Role-Playing Game)” evolved over time and what are the differences between “JRPG” 
and “Western RPG”?). 

We plan to mitigate these issues by systematically evaluating the scheme via soliciting feedback 
from gamers, and continuing to evaluate video games against the scheme, especially newer games such as 
digitally downloadable games and apps. Creating any controlled vocabulary requires continuous attention 
and ongoing maintenance. However, because faceted classification offers increased flexibility and 
extensibility over other systems like hierarchical taxonomies or keyword lists (Kwasnik, 1999), this scheme 
is designed to be easier to update and revise. As the video game domain evolves, we plan to continually 
revise our scheme with help from domain experts and enthusiasts. 


4 Data and Discussion 


4.1 Facets and Foci 


Twelve facets were identified, each representing a different characteristic of division related to video game 
genres. The first column of Table 1 lists the facets. The number of foci (i.e., indexing terms) identified 
under each facet is provided in the second column. The third column illustrates a small number of foci 
examples for each facet’. Certain facets and foci were structured hierarchically: for instance, the facet 
Gameplay has the sub-facet Style describing more specific kinds of gameplay; Theme has 22 parent terms 
that are divided into 127 child terms; and Setting was divided into two sub-types “Spatial” and “Temporal.” 
The following subsections provide more detailed information on each facet and challenges faced when 
defining them. 

Determining the characteristics of division in video game genres was not clear-cut and required 
much discussion. Many questions arose, some of which we could not answer in a satisfying manner. However, 
this faceted scheme provides a flexible framework that can represent multiple foci under any facet, thus 
allowing a more thorough representation of the subject of a video game. The scheme is easily extendable 
and therefore accommodating of an unlimited number of new foci as games evolve. Note that for some facets 
that emerged during our work, we were able to identify only one or two instances of the genre labels 
representing those facets (e.g., Forms of expression — textual or graphical, Number of players - MMORPG, 
MMOFPS), thus they were excluded in our final scheme. 


Facet Number of Foci Examples of Foci 

Gameplay 10 Action, Fighting, RPG, Strategy 

Style 100 Under gameplay “Action” (Beat’em up, Platformer, Rhythm) 
Under gameplay “Shooter” (Shmup, Light Gun, Run & Gun) 

Purpose 7 Education, Entertainment, Party 

Target 18 Everyone (ESRB), 12+ (iTunes), MA-17 (VRC), Low maturity 

Audience (Android) 

Presentation 10 2D, 3D, Grid-based, Side scrolling 

Artistic style 9 Abstract, Cel-shaded, Retro 

Temporal 7 Real-time, Turn-based, Multiple game clocks, Timed action 

Aspect 


' Space limitations prevent a full list of foci for certain facets such as Style or Theme. The full scheme will be accessible on 
http://gamer.ischool.uw.edu/. 
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Point-of-view 4 First person, Third person, Overhead, Multiple perspectives 
Theme 22 (parent) Nature: Animals, Dinosaurs 
127 (child) Food: Restaurant, Bakery 


Fantasy: Princess, Knights 
Sports: Baseball, Basketball 


Setting 16 (spatial) Spatial: Casino, Spaceship, Western, Urban 

8 (temporal) Temporal: Medieval, Modern, Futuristic, Steampunk 
Mood/Affect 15 Horror, Humorous, Dark, Peaceful 
Type of 5 Finite, Branching, Circuitous, Infinite, Post-game 
ending 


Table 1: Video Game Genre Facets with Examples of Genre Labels Representing Each Facet 


4.1.1 Gameplay 


In this scheme, Gameplay is defined as “the overall nature of the experience defined by a pattern of 
interactions and game rules.” The foci listed under the facet Gameplay are the terms used most commonly 
as genre terms in typical video game organization systems. This particular facet is also considered the most 
fundamental category of all facets in our scheme. Discussions among researchers and SIMM staff led to ten 
chosen foci: Action, Action/Adventure, Driving/Racing, Fighting, Puzzle, RPG, Shooter, Simulation, 
Sports, and Strategy. Our definitions of these foci along with game examples are provided below: 


e Action: Games with a heavy emphasis on a series of actions performed by the player in order to 
meet a certain set of objectives (e.g., Super Mario Bros., Patapon) 

e Action/Adventure: Games which are set in a world for the player to explore and complete a certain 
set of objectives through a series of actions (e.g., The Legend of Zelda, Prince of Persia) 

e Driving/Racing: Games involving driving various types of vehicles as the main action, sometimes 
with an objective of winning a race against an opponent (e.g., Mario Kart, Gran Turismo) 

e Fighting: Games involving the player to control a game character to engage in a combat against an 
opponent (e.g., Street Fighter, Mortal Kombat) 

e Puzzle: Games with an objective of figuring out the solution by solving enigmas, navigating, and 
manipulating and reconfiguring objects (modified from Wolf, 2001) (e.g., Tetris, Minesweeper) 

e RPG: Games with an emphasis on the player’s character development and narrative components 
(e.g., Final Fantasy, Mass Effect) 

e Shooter: Games involving shooting at, and often destroying, a series of opponents or objects (Wolf, 
2001) (e.g., Doom, Duck Hunt) 

e Simulation: Games intending to recreate an experience of a real world activity in the game world 
(e.g., SimCity, Trauma Center) 

e Sports: Games featuring a simulation of particular sports in the game world (e.g., FIFA series, Wii 
Sports) 

e Strategy: Games characterized by players’ strategic decisions and interventions to bring the desired 
outcome (modified from Apperley, 2006) (e.g., StarCraft, Total War series) 


One key challenge in developing these foci was mutual exclusivity. Some categories seem to have unclear 
boundaries and overlap conceptually. Many questions emerged, such as: how different are Action and 
Action/Adventure games? How about games employing multiple gameplay components such as 
Action/Adventure, RPG, and Puzzle? Are all games essentially Action games since they all require some 
actions performed by the player? The category Action in particular seemed akin to the music genre “Pop” 
or movie genre “Action” which work to specify a particular type of cultural object as well as a “catch-all” 
category. A first attempt at resolution focused on providing simple and clear definitions with example games 


131 


iConference 2014 Jin Ha Lee et al. 


for each of the categories. In future work, we plan to further test and evaluate these foci by cataloging a 
larger number of sample video games and investigating whether gamers are reasonably able to comprehend 
and distinguish among these different labels of gameplay. 


4.1.2 Style 


Style is defined as “a particular distinctive characteristic, mode of action, or manner of a gameplay.” This 
facet functions as a Gameplay sub-facet. The foci from both facets are combined to create a compound 
indexing term (e.g., Action — Platformer; Action/Adventure — Stealth, RPG - MMORPG; Strategy — Tower 
defense). This facet allows for a more intelligent collocation of similar games under a particular Gameplay 
such as the sub-categories of Beat’em up vs. Platformer, which are both types of Action gameplay. 


4.1.3 Purpose 

Purpose is defined as “the reason for why the game exists as intended by the game designer(s) /developer(s).” 
Purpose emphasizes the intention(s) of game designers and developers rather than that of end users; how 
users ultimately use the game is contextual and subjective. Our final list is comprised of six purposes: 


e Education: Games in which the goal is to support learning. There are a broad range of educational 
games, from those teaching spelling to computer programing to animal facts, etc. (e.g., Big Brain 
Academy: Wii Degree, Carmen Sandiego series) 

e Entertainment: Games in which the goal is to allow the player to have fun. A large majority of 
games have entertainment as their purpose. (e.g., Mass Effect, Kingdom Hearts, Super Mario Bros.) 

e Exercise: Games in which the goal is to get players to move their physical bodies and burn calories 
or participate in some type of athletic pursuit. (e.g., Wii Fit series, Dance Dance Revolution) 

e Meditation: Games which help support players’ engagement in meditation and mindfulness 
activities. (e.g., Leela, Meditation Balance Game on Wii Fit) 

e Party: Games designed to be played in the setting of a social gathering. These games are designed 
for relatively short-duration play, allow for multiple players and quick turn-taking, and may also be 
designed to be spectator-friendly for the enjoyment of those who are not currently playing. (e.g., 
Mario Party, Rayman Raving Rabbids, Wario Ware) 

e Social: Games designed to involve around heavy social interaction rather than playing in solitude. 
The players engage in group activities such as making friends, chatting, sending daily gifts, teaming 
up for tasks, etc. (e.g., Farmville, CityVille, Gaia Online) 


4.1.4 Target Audience 


Target audience is defined as “a group of people for whom the resource is intended or useful, determined 
by the creator or the publisher of the game.” Rather than creating another scheme to represent this 
information, we decided to incorporate existing rating information from organizations such as the 
Entertainment Software Rating Board (i.e., Early Childhood, Everyone, Everyone 10+, Teen, Mature, 
Adults Only, Rating Pending) or Videogame Rating Council (i-e., General Audiences, Mature Audiences- 
13, Mature Audiences-17, Not Yet Rated). Apple and Android also have their own rating systems for game 
apps based on the age of the player (i.e., 4+, 9+, 12+, 17+) and the level of maturity in the game content 
(i.e., Everyone, Low maturity, Medium maturity, High maturity), respectively. 


4.1.5 Presentation 
Presentation is defined as “the manner or style of game display” containing the following ten foci: 


e 2D: Representation of space in two dimensions. (e.g., A Boy and His Blob, Odin Sphere) 
e 3D: Representation of space in three dimensions. (e.g., God of War, Uncharted) 
e Isometric: Games that use isometric projection to render three-dimensional objects in two 


dimensions. (e.g., Final Fantasy Tactics, Age of Empires) 
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e Static background: Games with a background display that does not move or change. (e.g., Peggle, 
Princess Maker) 

e Vertical scrolling: Games with a display that scrolls vertically where characters typically move from 
bottom to top. (e.g., 1942, Raiden) 

e Side scrolling: Games with a display that scrolls horizontally where characters typically move from 
left to right. (e.g., Muramasa, Castlevania: Symphony of the Night) 

e Grid-based: Games featuring a display that is made up of a series of intersecting vertical and 
horizontal axes. (e.g., Bejeweled, Tetris) 

e Video backdrop: Games based on interacting with a motion-video backdrop, either as scenery or an 
enemy (modified from mobygames.com). (e.g., Area 51, EyeToy Groove) 

e =©Text-based: Games that use text as the main display method. 

e Perspective manipulation: Games where characters are able to switch between multiple display 


methods (e.g., 2D to 3D or vice versa). (e.g., Super Paper Mario, Perspective) 


Defining this facet presented another challenge: what is the nature of the relationship between the 
Presentation and Artistic style (see below) facets? After lengthy discussion and examination of extant terms 
and screenshots of game displays, we determined that it would be useful to separate the technical aspects 
from the artistic or aesthetic aspects of game display. Thus two different facets in our scheme describe the 
visual aspects of video games. 


4.1.6 Artistic style 


Artistic style is defined as “a cohesive and unifying visual aesthetic.” A total of nine foci were identified: 
y ying 


e Cartoon: A style that incorporates elements typical in Western comic books and animations. (e.g., 
Batman: The Scarecrow’s Revenge, Plants vs. Zombies) 

e Anime/Manga: A style that incorporates elements typical in Japanese comic books and animations. 
(e.g., Shin Megami Tensei: Persona 4, Tales series) 

e Retro: A style that incorporates pixilated looks of objects, characters, or environments that were 
common in older games. (e.g., 3D Dot Game Heroes, Hotline Miami) 

e Realistic: A style portraying objects, characters, or environments in a realistic manner. (e.g., Final 
Fantasy XIII, Halo 4) 

e Abstract: A style that uses simple forms, colors, and lines. (e.g., Lumines, Dyad) 

e Handicraft: A style where objects, characters, or environments look like they are hand-made. (e.g., 
Little Big Planet, Platypus) 

e Watercolor: A style where objects, characters, or environments look like they are painted in 
watercolor. (e.g., Okami, Braid) 

e Cel-shaded: A style that renders light and shadow to enhance the illusion of a 3D surface. (e.g., The 
Legend of Zelda: The Wind Waker, Catherine) 

e Wireframe: A style of revealing the design structure of 3D objects with lines and curves (e.g., 
Battlezone, Stellar 7). 


This facet focuses on the “look” of a game from an artistic or aesthetic point of view. Extant terms describing 
the artistic style of games were poorly described and rarely defined. Additional complications arose, such 
as how to deal with games that were intended to look “realistic” yet now look “retro” because of technical 
limitations at the time of the game’s creation (e.g., Final Fantasy VII). Do we represent the intention of 
the creators or the actual display? If the actual display is represented, then how might this information 
change over time? It is easily possible that games we currently perceive as Realistic will be perceived as 
Retro after 20 years. 
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4.1.7 | Temporal aspect 


Temporal aspect is defined as “the methods by which time passes in the game and/or manner in which 
events take place.” We identified the following seven foci: 


e Real-time: The game time progresses continuously and actions are performed in real-time. In battles 
and combat, all the units act simultaneously and the player is expected to act quickly to eliminate 
the enemies. (e.g., Star Ocean, Kingdom Hearts) 

e Turn-based: The game time is divided into turns and actions are performed by the players taking 
turns. This allows players to take time to make strategic decisions. (e.g., Final Fantasy X, Valkyrie 
Profile) 

e Time manipulation: Players are able to manipulate time by taking certain actions (e.g., changing 
day to night by playing a song) or change the time flow in the game (e.g., The Legend of Zelda: 
Ocarina of Time, Prince of Persia) 

e Time travel: Players are able to move between different points of time in the same timeline. (e.g., 
Chrono Trigger, GrimGrimoire) 

e Multiple game clocks: Players are able to move between different points of time in multiple timelines 
that might converge or stay independent. (e.g., Final Fantasy XIII-2, Radiant Historia) 

e Calendar-based game clock: The game time progresses based on a calendar, sometimes regardless of 
players participating in game actions. (e.g., Shin Megami Tensei: Persona 3, Animal Crossing) 

e Timed action: Players must complete certain action in a given amount of time in order to 
successfully progress in the game. (e.g., Trauma Center, Time Crisis) 


Certain foci will be more relevant for games employing particular types of gameplays: for instance, timed 
action may appear more often in Simulation or Shooter games, time manipulation in RPGs or 
Action/Adventure games, and so on. Multiple foci may be applicable for certain games (e.g., Real-time, 
Time travel, and Time manipulation in The Legend of Zelda: Ocarina of Time). 


4.1.8 Theme 


Theme is defined as “the common thread or ideas that recur in the game.” Theme can help represent the 
“aboutness” of the game regardless of the Gameplay or Style, allowing the collocation of games by theme 
despite these other facets. Some examples of foci include abstract concepts such as Death, Friendship, or 
Coming-of-age, entities such as Superheroes, Zombies, Robots, and Pirates, or subjects like Art, Music, 
Management, and so on. We have organized a total of 127 different themes under 22 main categories (i.e., 
Art & Design, Business, Children, Concept, Crime, End of the world, Fantasy, Food, History, Holidays, 
Law, Medicine, Nature, Politics, Religion, Science, Sci-fi, Sex, Sports, Supernatural, Travel & 
Transportation, and War & Fighting) and we anticipate this list will grow as we test larger numbers of 
games. 

Categorizing these themes was not trivial, especially given definitional criteria such as mutual 
exclusivity and comprehensiveness. ‘User warrant’ will help keep this scheme relevant. Therefore we 
evaluated themes based on the likeliness that users would seek games featuring a given theme. In order to 
empirically ground this list of foci, we plan to conduct additional studies involving real users in our future 


work. 


4.1.9 Setting 


Setting is defined as “the surroundings or environment (spatial or temporal) in which the game takes place.” 
Currently the foci under setting are divided into two sub-categories: “spatial” (i.e., Asian, Casino, Castle, 
Desert, Game show, Hospital, Nature, Ocean, Rural, School, Space, Spaceship, Tundra, Urban, Virtual 
worlds, and Western) and “temporal” (i.e., Cyberpunk, Futuristic, Gothic, Historic, Medieval, Modern, 
Renaissance, and Steampunk). 


134 


iConference 2014 Jin Ha Lee et al. 


Our discussion evoked the following questions: how do we describe games such as Mario Kart where 
the environment of the game changes in each stage of the game? How about puzzle games such as Peggle 
that may feature a display of a particular setting such as Space or Desert in the background, but it does 
not significantly affect the gameplay or story? Again, user warrant could be helpful in determining when 
and how to apply this facet: in other words, would describing setting information in such games be 
potentially useful for users? For example, when searching for games featuring a Setting in Space, would 
users expect to see games like Mario Kart or Peggle in their results? Setting may not be relevant to all 
existing games, but rather only applicable to games with more complex environments. Future planned 
interviews with real gamers will help reveal useful applications of Setting. 


4.1.10 Mood/Affect 


Mood/Affect is defined as “the pervading atmosphere or tone of the video game which evokes or recalls a 
certain emotion or state of mind.” The role of emotions (e.g., pleasure, arousal, dominance) in playful 
consumption of games has been well-documented (Holbrook et al., 1984). As games feature increasingly 
complicated narratives, the relevance of this facet will increase. We identified fifteen common moods in 
games including Adventurous, Aggressive, Cute, Dark, Horror, Humorous, Inspirational, Intense, Light- 
hearted, Mysterious, Peaceful, Sarcastic, Sensual, Solitary and Quirky. Mood taxonomies established for 
other media, like music moods from allmusic.com, may prove to be useful for expanding this list in future 


work. 


4.1.11 Type of Ending 


Type of ending is defined as “the method by which the player is lead to gameplay culmination.” While this 
information is often sought on video game web forums such as Gamefags, it is not typically found in many 
commercial websites. There are five foci for this facet: 


e Branching: A game with multiple endings (e.g., Bioshock 2; Shadow Hearts). 

e Circuitous: A game with a “new game plus” feature that allows players to start a new game after 
completing the game once, while retaining some of the experience, status, or items in the newly 
started game (e.g., The Walking Dead: The Game; Tales of Graces F). 

e Finite: A game with a single terminal ending (e.g., Portal 2; Final Fantasy VII). 

e Infinite: A game with no definite ending, such as one that is set in an open world (e.g., World of 
Warcraft, Tiny Tower). 

e Post-Game: A game with bonus content that can be unlocked after completing the game once, such 
as post-game dungeons (e.g., Batman: Arkham Asylum, Valkyrie Profile 2: Silmeria). 


4.2 Application of the Scheme 


As explained previously, we attempted to describe the subject of several sample games using the proposed 
genre scheme. Notable examples are provided in Table 2. In this scheme it is possible to apply multiple foci 
for each facet as necessary. For certain types of games, some facets may not be applicable, thus they were 
left blank in the table. For instance, a tile-matching puzzle game such as Bejeweled 2 does not have a 
coherent theme, notable setting, or mood. Also, some older games do not have rating information, like Super 
Mario Bros. which was published in 1985. 
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Table 2. Facets of Video Game Genres with Examples of Genre Labels Representing Each Facet 
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5 Conclusion and Future Work 


This paper reports on a first step in understanding the complexity of video game genre in order to devise a 
more robust scheme for representing this information. An analysis of game labels used in game-related 
websites and catalogs revealed that the metadata element “genre” was heavily overloaded with multiple 
dimensions of information. Through the method of facet analysis, it is possible to ascertain and represent 
the different types of information embedded in current video game genre labels in a flexible and extensible 
way. 

Future work includes complete definitions for all the indexing terms under each facet (e.g., what is 
the definition of a Stealth style game?; what counts as a Cyberpunk setting?); the creation of additional 
metadata records to test the scheme; and usability studies involving real users of video games. Such studies 
will not only shed light on how gamers understand genre and the clarity of the terms and definitions 
developed here but also evaluate the usefulness of a faceted, multi-dimensional genre classification for 
locating and accessing games. 

This project is part of a larger research agenda to develop a metadata schema specifying important 
information features, their definitions, and attributes for video games. This schema will include the genre 
scheme described here as well as other types of metadata elements that are useful for a wide variety of users 
interested in video games. We have established a core schema containing the 16 metadata elements crucial 
in describing video games in any context (Lee et al., 2013) and we are currently in the process of developing 
a larger recommended set. We hope to augment existing standards in the LIS field, such as FRBR and 
related standards, as well as assisting organizations such as SIMM and Common Sense Media by providing 
a more formal metadata schema and encoding schemes that can be used across multiple game-related 
websites and other resources. Eventually, the scheme will be used to describe the video game collection 
owned by SIMM, from which a working catalog will be created, enabling users to search and browse games. 
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Abstract 

In light of students’ reliance on the Internet, their general lack of IL skills and unsophisticated criteria 
for evaluating online information, and the lack of consistent institutional IL training, new pedagogical 
models are needed to teach effective online IL skills. This research addresses the need for today’s students 
to learn to effectively evaluate online information and describes pilot tests of a prototype online credibility 
evaluation learning tool. Results of online and in-person pilot tests showed that students had positive 
responses to the tool and indicated that they found it useful and effective. Concrete suggestions for 
improving the tool were generated. This research investigates a new pedagogical model to teach IL and 
credibility evaluation skills situated in the online information environment. 
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1 Introduction 


Information literacy (IL) has been called a survival skill in the Information Age (ALA, 1989; Eisenberg, 
2010) and “a prerequisite for participation in society and the work force” (US 21st Century Workforce 
Commission, 2000). It has also been described as the critical literacy of the 21st century and the foundation 
of learning in our contemporary environment of continuous technological change (Bruce, 2004). According 
to the American Library Association’s definition: “To be information literate, a person must be able to 
recognize when information is needed and have the ability to locate, evaluate, and use effectively the needed 
information" (ALA, 1989), which has become the widely accepted definition in academic libraries. Academic 
and accreditation agencies include IL goals in their educational standards, including the Association of 
College and Research Libraries and the Middle States Commission on Higher Education (ACRL, 2000; 
MSCHE, 2003). 

Expanding on the ALA definition, IL is now seen as not just a single skill but a set of skills that 
are increasingly recognized as critical to success in today’s economy and society, with several professional 
organizations including IL skills in their official standards. The Partnership for 21st Century Skills’ 
“Framework for 21st Century Learning” describes the “skills, knowledge and expertise students should 
master to succeed in work and life in the 21st century” (Partnership for 21st Century Skills, 2011), among 
which are information literacy and critical thinking. Another professional organization, the International 
Society for Technology in Education (ISTE), developed their National Educational Technology Standards, 
described as “the standards for learning, teaching, and leading in the digital age” (ISTE, 2012), which 
include “Research and Information Fluency” and “Critical Thinking, Problem Solving, and Decision 
Making.” A report from the Georgetown University Center on Education and the Workforce states that 
competencies such as critical thinking, active learning, and complex problem-solving are required for success 
in STEM (Science, Technology, Engineering, Mathematics) occupations, which are critical to our nation’s 
continued economic competitiveness (Carnevale, Smith, & Melton, 2011). These standards from professional 
organizations indicate the IL skills are valued and needed not just in academia, but in the professional 
workplace as well. Students benefit from these critical skills throughout their lives, as they are key to 
preparing students for life-long learning (ALA, 1989; Daugherty & Russo, 2011). 
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2 Literature review 


A key component of information literacy is the ability to evaluate the quality of information sources. In 
today’s information environment, evaluating the credibility of online information sources may be difficult 
for students due to the volume and diversity of sources and the lack of conventional quality control 
mechanisms and indicators of authority from traditional print-based formats (Rieh, 2002; Gasser et al., 
2012; Metzger et al., 2010). Historically, the markers of credibility in print-based information sources were 
maintained by professional gatekeepers such as editors and reviewers (Rieh & Danielson, 2007). One of the 
chief differences between the web and traditional sources of information is that the web often lack the filters 
and markers of institutional credibility and authority which promote reliability in many print sources 
(Burbules, 2001; Mackey & Jacobsen, 2011). Overall, web pages typically offer few reliable cues to credibility 
that students can use in their evaluations (Iding et al., 2008). Today’s students must learn a new set of 
evaluation skills. 

Despite the attempts of IL instruction programs to instill critical evaluation skills, research shows 
that college students rarely evaluate the quality of information sources that they find online (Becker, 2003; 
Julien and Barker, 2009; Kolowich, 2011; Parker-Gibson, 2005; Walraven et al., 2009). Overall, students 
have trouble evaluating information and do not have a critical attitude towards information on the web 
(Brand-Gruwel et al, 2005). This is a particularly urgent problem since the web is the first choice of 
information source for most students (Curtis, 2000; Herring, 2011; Mizrachi 2010; Swanson 2005). Costello 
et al. (2004) note that students with an “information-age mindset” rely almost exclusively on the web for 
all their information needs. College students overwhelming rely on Google to the exclusion of scholarly 
databases and library research tools (Hargittai et al., 2010; Head & Eisenberg, 2011; Kim & Sin, 2011; 
OCLC, 2002). Instead, students use tools that require little skill, and “appear satisfied with a very simple 
or basic form of searching” and assume that “search engines ‘understand’ their queries” (Rowlands et al., 


2008, p. 297). In addition, students tend to demonstrate inflated views of their own IL skills, especially 
students with lower level skills whose lack of skill hinders their ability to accurately assess their own 
performance or to recognize expertise in others (Gross and Latham 2007). 

Studies in the library community demonstrate that IL instruction has a positive impact on student 
skills, performance and academic achievement. College students who participate in information literacy 
classes report significantly less library anxiety (Van Scoyoc, 2003) and high achieving students are more 
likely to report experiencing formal information literacy instruction (Smalley, 2004; Gross & Latham, 2007). 
Wang (2006) found statistically significant differences in grades between college students who took a library 
credit course and students who did not, and those who had taken the instruction in library skills received 
higher grades on their papers and in their courses. Selegean, Thomas & Richman (1983) also found a 
statistically significant improvement in the academic performance of those college students who had 
completed the library instruction course over those students who had not. Ren (2000) found that receiving 
library instruction significantly increased college students’ self-efficacy in electronic information searching. 
School library studies have also shown IL’s positive effect on high school student attitudes and achievement. 
Goodin (1991) showed that IL instruction makes a significant impact on high school students' attitudes and 
performance and helps prepare them for college; Lance et al. (2000) showed that school library programs 
increased high school student reading scores; and Todd et al. (1992) demonstrated positive impacts on high 
school students’ learning processes and outcomes. 

While stakeholders in higher education and in professional societies agree that IL is necessary to 
students’ success in their education and afterward in their work and personal lives, only a small percentage 
of higher education institutions include a required information literacy component (Boff and Johnson 2002). 
Where they are instituted, traditional library-based IL training methods (one-shot sessions, tutorials, 
worksheets) are often simplistic, not customized to the online information environment, and rely on a 
traditional classroom-based pedagogical model, and thus may not connect effectively with today’s students 
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(Manuel, 2002; Gibson, 2008; Leach and Sugarman, 2005). These brief training sessions may be the only 
explicit and focused exposure to IL that most students receive, however, the limited time and contact with 
students make it difficult for librarians to keep students interested and engaged (Doshi, 2006). When 
learning new skills, today’s students often prefer active involvement in the learning process, and a 
networked, participatory learning environment (Davidson & Goldberg, 2009; Halse & Mallinson, 2009; 
Thomas & Brown, 2011). Overall, one-shot instruction sessions rarely provide students with the engagement 
and sustained practice required to learn, apply and master IL competencies (Mokhtar et al. 2008, Mery et 
al. 2012). 

As an alternative to library-based instruction, learning software applications can support students 
in learning IL skills in a networked, participatory learning environment. through the use of instructional 
scaffolding. These computer-based learning environments incorporate scaffolding, defined as “instructional 
support in the form of guides, strategies, and tools that are used during learning to support a level of 
understanding that would be impossible to attain if the students learned on their own” (Azevedo, 2005, p. 
199). These instructional scaffolds can help students to work through a difficult task and attain a higher 
level of proximal development that would be beyond their unassisted efforts (Ge & Land, 2004). With the 
assistance of scaffolds, learners can bridge the gaps between their current abilities and intended learning 
goals that would be unachievable through their unassisted effort alone (Rosenshine & Meister, 1992). The 
use instructional scaffolding can help students to develop strategies to be more critical in their evaluation 
of the credibility of web sources (Iding et al. 2008). 

In light of students’ reliance on the Internet, their general lack of IL skills, limited critical evaluation 
practices, and the lack of effective institutional IL training, new pedagogical models are needed to teach 
effective online IL skills. Specifically, there is a need for IL training that is customized to the online 
information environment and relevant to the research habits of today’s students. If students are to 
effectively evaluate the credibility of online information sources, they must learn the specific criteria on 
which to judge the credibility of these sources, and the evidence necessary to support their evaluations 
(Metzger, 2007; Harris, 2008). They must also learn to base their judgments on evidence-based source 
characteristics rather than relying on subjective judgments based on intuition or projection (Markey, Rieh 
& Leeder, in press). A new pedagogical model to address these issues should provided through structured 
scaffolding that support students in reflecting on their learning. Developing students’ critical skills regarding 
online credibility evaluation, and helping them learn a structured process based on specific criteria and 
making judgments based on specific evidence, will help students become critically aware users of online 


information, and will prepare them for lifelong learning. 


3 InCredibility: A prototype online-credibility-evaluation learning tool 


To address these issues, a custom-built prototype online credibility evaluation learning tool called 
“InCredibility” has been developed. The objective of the tool is to teach students how to evaluate the 
credibility of online information based on specific criteria and using specific evidence-base source 
characteristics. InCredibility situates IL instruction in the online environment where students actually do 
their research, and guides them through a structured process of evaluating online information in an 
interactive format. In addition, the online participatory tool can be used collaboratively by a large class 
while researching information for an assignment. The tool consists of a tutorial introducing the basic 
questions to ask when evaluating the credibility of online information (Who, What, Where, When and 
Why) and an interactive support feature to guide students in how to use the specific elements of websites 
to gather evidence for their evaluation (Figure 1) followed by an interactive exercise in which they practice 
locating the parts of the website which can provide evidence for their answers (Figure 2). Pop-up boxes 
give feedback on correct and incorrect answers, and a Tip button is available to give more guidance on 
finding the correct evidence. 
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Figure 1: Tutorial instruction page 


Using this example page, find evidence to answer the questions 
Who, What, Where, When and Why. 
Select the section of the page that answers each question. 
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Figure 2: Interactive tutorial 


Once students complete the tutorial, they begin the structured evaluation process, broken down into stages 


titled Investigate, Question, and Solve. In the Investigate stage, they search for online sources about their 


research topic. First, they use a browser-based plugin, the Notebook, to gather evidence for answering the 


evaluation questions (Figure 3). The Notebook prompts students to answer the Who, What, Where, When 


and Why questions with specific evidence from the website they have found. 
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Figure 3: Notebook open in browser 


The responses students enter in the Notebook are saved to a InCredibility website. During the Investigate 


stage (Figure 4) students can review and revise their answers to the credibility questions. 
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Figure 4: Investigate stage 
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The Question stage (Figure 5) asks students to evaluate other students’ responses, and prompts reflection 
on their own work and the different ways that others may evaluate evidence. This helps reinforce that 
credibility judgments can be subjective. 
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Question 


Completed: 2/5 


Social Media And Video Games In Classrooms Can Yield Valuable Data For 
Teachers 


http://www.huffingtonpost.com/2012/04/30/social-media-and-video- 
ga_n_1465082.htm! 


WHO =p 


WHAT 
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Author: gton Posi 
WHERE Yes No 
Qualifications: 
WHEN 
Wa Yes No 
WHY Background: 
Yes No 


Submit 


Figure 5: Question stage 


In the Solve stage (Figures 6) students compare two randomly chosen sources and their evaluations, and 
make decisions about which source is best for their research topic. This stage guides students through 
synthesizing multiple types of evidence to make a credibility evaluation, and reinforces that credibility is 
not a single factor but incorporates many elements. 


inCredibility BE a E 


Solve S 
Completed: 9/5 


Compare these two sources and their evaluations 


Who) 


Figure 6: Solve stage 


Each stage of the InCredibility tool builds upon the previous stage, providing students with multiple 
opportunities to practice applying the credibility criteria and reinforcing their learning through reflection 
on their own work and that of their peers. The original five questions are matched to higher-level concepts 
(Authority, Relevance, Reliability, Currency, and Purpose), providing instructional scaffolding from 
everyday terminology to expert terminology. Through a step-by-step process of web-based tips and scaffolds, 
including visual process maps, progress monitors, and reflective questions, students learn to plan, monitor 
and reflect on their learning. 

The design of the tool follows Quintana et al.’s Scaffolding Design Framework of supporting 
sensemaking, process management, and reflection and articulation (2004). Learning is scaffolded by the 
structured decomposition of tasks into discrete units, and the segmentation of the learning goal into stages. 
Since novice learners usually have weak metacognitive skills, which are important for engaging in complex 
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practices like online credibility evaluation, the prototype learning tool provides needed practice and 
reinforcement of these important skills (Quintana et al., 2005). 


4 Pilot testing the prototype 


The purpose of this research was to assess experimentally the functionality and understandability of the 
prototype tool and its interface, and to gather feedback from students on their experience using the 
prototype. Pilot testing of the prototype consisted of two phases: online testing of the tutorial section and 
in-person testing of the complete prototype. Students in a large, introductory undergraduate course were 
invited to participate in both stages of the pilot testing. Students were offered extra credit in the class for 
their participation. IRB exemption for the pilot tests was secured. 


4.1 Online tutorial pilot test 


The first stage of pilot testing focused on the tutorial portion of the tool, since this is the initial stage which 
introduces students to the concepts and skills they will learn using the tool. Fifty-five students completed 
the online pilot test, then answered an online survey regarding the usability of the tutorial and their 
experience using it. Students were asked their year in college and their level of experience with searching 
for information online. Demographics of the online pilot test subjects are shown below in Tables 1-2: 


Option Response % 
Freshman 12 22% 
Sophomore 19 35% 
Junior 17 31% 
Senior T 13% 
Total 55 100% 


Table 1: Year in college 


Option Response % 
Not at all experienced 0 0% 
A little experienced 5 9% 
Average experience 34 62% 
Above average experience 16 29% 
Total 55 100% 


Table 2: Level of experience with searching for information on the Internet 


The focus of the survey questions was on functionality, understandability of instructions, questions, and 
tips incorporated in the tool, and the terminology employed. Subjects were also asked for any other feedback 
they would like to provide on the experience of using the tool. 

A high-level summary of the survey responses is shown below: 


e Functionality: 75%, of subjects indicated that they did not experience any issues or problems with 
the tool’s functionality. 

e Instructions: 80% of subjects indicated that the instructions for the tutorial were clear; 20% 
indicated some confusion. 

e Questions: 55% of subjects indicated that they had trouble answering some of the questions, 
although some were technical issues. 

e Tips: 52% of subjects did not use the tip button, and some did not even notice it. 
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e Terminology: Some subjects did not understand the 5Ws terminology, especially Where and Why. 
e Comments: Several students suggested better introductions and definitions of terms, as well as a 
summary/review at the end of the tutorial. 


Overall, subjects responded positively about the tutorial’s functionality and usability. The primary areas of 
difficulty were the insufficient instructions and lack of use of the Tip button. 


4.2  In-person pilot test of prototype 


The second stage of pilot testing involved an in-person walkthrough of the complete prototype, to gain 
individualized feedback from users about the usability of the tool. Eight students completed the in-person 
pilot test. Students were asked their year in college and their level of experience with searching for 
information online. Demographics of in-person pilot test subjects are shown below in Tables 3-4: 


Option Response % 
Freshman 1 12.5% 
Sophomore 4 50% 
Junior 2 25% 
Senior 1 12.5% 
Total 8 100% 


Table 3: Year in college 


Option Response % 
Not at all experienced 0 0% 
A little experienced 0 0% 
Average experience 2 25% 
Above average experience 6 75% 
Total 8 100% 


Table 4: Level of experience with searching for information on the Internet 


The test was conducted in a computer lab with the researcher present. Subjects were asked to “think aloud” 
as they used the tool, starting with the tutorial and following the three stages of the structured credibility 
evaluation process. If subjects failed to think aloud, the researcher asked prompting questions about what 
they were doing and why. As with the online pilot test, the focus of the questions was on functionality, 
understandability of instructions, questions, and tips incorporated in the tool, and the terminology 
employed. 


Summary of in-person responses: 


e Functionality: Most participants liked the functionality of the Notebook, the structure of the 5Ws 
and the sequence of stages 

e Instructions: Most participants said that they needed more and clearer instructions and definitions, 
and some suggested a video intro 

e Tips: Most participants didn’t use the tips, or sometimes did not notice the button, due to color 
and placement 

e Comparing sources: Most participants liked the comparison of two sources side by side in the Solve 
stage 

e Evaluations: Some participants were confused by the task in Solve, and were unsure if they were 
judging the comments added by other students or their own evaluation of the quality of the source 
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e Terminology: Some students indicated confusion over terminology, including “currency,” “sources,” 


and “keywords.” 


One highlight of the in-person pilot test was this example of feedback which demonstrates the student’s 


learning: 


“T really like [InCredibility] because it makes me go over the source, the text, really well. It forces 
me to look for more information about the source and about the author, what he’s really talking 
about... I think it really helps me focus on an article more, instead of just skimming it. It definitely 
takes more time but I feel like I’m getting so much more than what I would just do on a skimming 
basis... (It’s) a fun way to approach an instructional thing to do during class, ‘cause it’s on the 
Internet so I feel you’re more engaged than if it were through a presentation.” 


Overall, subjects in the in-person pilot test responded positively about the tutorial’s functionality and 
usability. Again, the primary issues were the need for more instructions and lack of use of the Tip button. 


5 Discussion 


Comparing the demographics of the two subject groups, the subjects of the in-person test reported being 
slightly older and slightly more experienced than the online test, although not significantly. This may reflect 
the greater willingness of more experienced students to volunteer for a study. Overall, these subjects 
generally represent the college student audience for which the online credibility evaluation tool is intended. 

Results of the online pilot test showed that the functionality of the online tutorial worked well as 
participants reported few problems with usability, although the instructions and definitions can be 
expanded. Subjects provided some useful suggestions, such as adding a review of the tutorial content at the 
end to reinforce the material. The online survey seemed to be an effective way to gather feedback on an 
online tutorial from a large group of students. 

Results of the in-person pilot test showed that participants responded positively to the structure 
and content of the tool. Overall functionality worked well, although again the instructions were insufficient. 
This underscores the importance of pilot testing to understand the students’ perspective and gain insight 
into the understanding that they bring with them and the necessity of detailed description of the concepts 
they are learning. Subjects provided some useful suggestions, such as adding a video introduction to the 
tutorial and tool functionality, and providing a review of the tutorial content at the end. Although the in- 
person pilot testing took more time and restricted the number of participants that could be interviewed, it 
produced in-depth and helpful feedback on the tool. The researcher was able to probe participants 
perceptions of the tool and their experience using it. 

In both pilot tests participants often did not use the Tip button. In the in-person test, participants 
did not seem to notice the Tip box until the researcher pointed it out to them. When asked about this, 
they often replied they simply did not notice. This may be due to the color and placement of the button 
making it lee noticeable. Participants also seemed to ignore pop-ups that provided guidance and suggestions 
and closed them without reading the text, perhaps dismissing them as error messages and not realizing that 
there was helpful information included. 

Interestingly, many students reported that they had never investigated the “About” section of a 
website, which often states a site’s purpose or background, before using InCredibility. This relates to the 
tool’s objective of teaching students how to evaluate the purpose of a website, which can be one of the most 
difficult tasks of evaluating credibility. Thus, the tool introduced them to a new evaluation technique that 
they had not learned before. 
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6 Conclusion 


Overall, the participants in both pilot tests responded favorably to the experience of using the tool and 
indicated that is functionality and usability were effective. There did not seem to be any major issues with 
the use of the Notebook, the concept of the 5Ws, and the structure of the three stages (Investigate, Question, 
and Solve). Several subjects mentioned that they found the tool to be useful, with subjects expressing 
positive reactions to the skills practice they experienced using the tool. Some subjects stated that critiquing 
other people’s work helped them learn how to evaluate better. 

Several design changes have been made as a result of these findings. To address students’ failure to 
read the guidance given in the pop-up boxes and the lack of using the Tip button, some design changes 
were implemented. The word “TIP” was also added to the text of the pop-ups to draw the reader’s attention 
to the suggestions. Also, the “Close” button on the pop-ups, which was located at the top corner of the 
box, was moved to the bottom corner of the box, hopefully leading students to read the text before closing 
the pop-up. The Tip box color was changed to a bright red and the design changed to make it stand out 
more and hopefully make it more apparent to users. 

More explicit instructions and definitions are clearly needed, along with clarifying the purpose of 
the tasks and simplifying terminology. On a related note, results showed that the tool should not rely on 
library terminology, but phrase concepts in terms that students understand. Since students may be learning 
skills that they have never used before, they need clear and understandable explanations given in language 
they understand, especially at the introductory stages. Later, the instructional scaffolding of the tool, which 
helps bridge students to more advanced topics that they might not learn on their own, guides them through 
the learning process. 

This initial research suggests that the custom-built prototype online credibility evaluation learning 
tool can be used to support students in becoming more critical in their evaluation of the credibility of web 
sources. The prototype online credibility evaluation tool represents a novel pedagogical model to teach 
online IL skills. The tool situates IL instruction in the online environment where students actually do their 
research, and guides them through a structured process of evaluating online information in an interactive 
format. The use of instructional scaffolding can help students to develop evaluation strategies, learn the 
specific criteria on which to judge the credibility of online information sources, and the evidence-based 
source characteristics necessary to support their evaluations. Students are also supported in understanding 
IL as a structured process requiring practice, planning and reflection. Since this on online participatory tool 
can be used simultaneously by all students in large classes, it enables providing IL training to greater 
numbers of students than traditional methods. Ultimately, more effective delivery of training in IL skills 


will help today’s students in learning critical 21st century skills and prepare them for lifelong learning. 


7 Future research 


The prototype online credibility evaluation learning tool will be revised and expanded based on the results 
of these pilot tests, especially in the area of instructions and terminology. After this upgrade, the finalized 
tool will then tested in a randomized experimental study with college students. Results of the experiment 
will be analyzed for evidence of learning gains by students who use the tool vs. students who receive more 
traditional forms of IL instruction. The goal of the research is to provide a fully functional, experimentally- 
tested tool which can be adopted by IL instructors to teach credibility evaluation skills to large classes. 
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Abstract 

This paper presents a user-generated framework for designing affordances that would counter acts of 
cyberbullying on social media sites. To do so, we used narrative inquiry as a research methodology, which 
allowed our two focus groups — one composed of teens and the other of undergraduate students — to map 
out a cyberbullying story and overlay it with a set of design recommendations that, in their view, might 
alleviate mean and cruel behavior online. Four “cyberbullying stories” were constructed by the 
participants, each one revealing two sub-plots — the story that “is” (as perceived by these participants) 
and the story that “could be” (if certain design interventions were to be embedded in social media). In 
this paper, we describe seven emergent design themes evident in the participants’ design 
recommendations for social media: design for reflection, design for consequence, design for empathy, 
design for personal empowerment, design for fear, design for attention, and design for control and 


suppression. 
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1 Introduction 


This article presents a user-generated conceptual framework for understanding and guiding the design of 
social media that counteracts or prevents mean and cruel online behavior. It does so through the use of 
narrative inquiry, a research method that allowed teens and young adults to map out a cyberbullying story 
and overlay it with a set of design recommendations that, in their view, might prevent or alleviate 
cyberbullying. 

Two focus groups — a group of four teens in high school and a group of five young adults completing 
their undergraduate studies — used storytelling and sketching to elicit visual narratives that communicated 
their perceptions of cyberbullying and to propose design features that might shape the cyberbullying story 
in a more positive direction. Four “cyberbullying stories” were constructed by the participants, each one 
revealing two sub-plots — the story that “is” (as perceived by these participants) and the story that “could 
be” (if certain design interventions were to be embedded in social media). 


2 Cyberbullying 

Bullying as a major public health concern is a historic problem, but 21* century technologies have allowed 
for it to assume new characteristics and have introduced new tactics for aggressive behavior (Juvonen and 
Gross, 2008, p. 497). “Cyberbullying,” as a distinct form of bullying, has consequently entered the vernacular 
in recent years, with scholars pointing to the intentional use of technology as a means to hurt another 
individual. In the hands of young people who are still developing their impulse control and are particularly 
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vulnerable to peer-pressure, social media can allow for "online expressions of offline behaviors" and facilitate 
negative and damaging activities, one of which is cyberbullying (O'Keefe et al., 2011, p. 800). 

Cyberbullying, while lacking in a consensus definition (National Science Foundation, 2011; Stewart 
and Fritsch, 2011), reflects the core elements of bullying as it is traditionally understood. Researchers at 
the National Science Foundation observe that the variety of definitions of cyberbullying “typically start 
with three concepts: intent to harm, imbalance of power and usually a repeated action, although some 
experts replace ‘repeated action’ with ‘specific targets’ ” (National Science Foundation, 2011). The use of 
electronic technologies to carry out this intent, display this imbalance of power, and target others repeatedly 
in cyberspace is, naturally, another component of the definitions surrounding cyberbullying. Brady (2010) 
describes cyberbullying as “ “the use of communication-based technologies, including cell phones, e-mail, 
instant messaging, text messaging, and social networking sites, to engage in deliberate harassment or 
intimidation of other individuals or groups of persons using online speech or expression” (p. 113). Patchin 
and Hinduja (2006) are briefer in their description, defining cyberbullying as the “willful and repeated harm 
inflicted through the medium of electronic text” (p. 152). 

While the academic literature provides this definitional context, Alice Marwick and danah boyd 
(2011) explain that the language adults used to speak about cyberbullying may differ from the language 
used by the young people who are involved in or who observe this online behavior. Rather than 
characterizing instances of online name-calling, arguments, and discord as “cyberbullying,” Marwick and 
boyd explain that teens attach the label of “drama” to these incidents. 

Cyberbullying has qualities that are distinct from bullying in its more traditional form of direct, 
face-to-face interaction between the dominant individual (the bully) and the less dominant individual (the 
victim). Juvonen and Gross (2008) cite the pervasiveness of Internet use, coupled with the absence of adult 
supervision in online environments, as creating “a fertile ground for bullying” beyond school grounds (p. 
497). In this electronic environment, bullies may feel both a sense of anonymity and distance, feelings that 
can promote harmful behavior (Mason, 2008; Suler, 2004; Trolley et al., 2006; Williard, 2005). The online 
environment provides an apposite set of factors for bullying to occur. Current research in online behavior 
and cyberbullying suggests that people with depression (which perpetrator and the target often struggle 
with) tend to prefer online social interaction, which may drive more behavior into the “cyber” context of 
bullying (Caplan, 2003). Further, online communication can be less tempered, and more emotionally and 
socially charged than face to face communication. Though anonymity is frequently pointed out as a 
disinhibiting factor in online interactions, Suler points to five additional factors at play with relevance to 
cyberbullying: invisibility, asynchronicity, solipsistic introjection, dissociative imagination, and 
minimization of authority - each of which can take place in non-anonymous contexts such as social media 
(Suler, 2004). 

Moreover, Rogers (2010) observes that the nature of cyberbullying allows for harassment and 
intimidation to gain entry into environments that were “safe” from traditional acts of bullying. She writes, 
“Cyberbullying can take place at any time during the 24-hour day...This can be responsible for a large part 
of the emotional damage inflicted on victims, who then feel they have no refuge, no one to trust and can 
never be safe anywhere” (p. 13). As Slonje and Smith (2008) remark, a victim of bullying in its traditional 
sense may be able to find solace in the knowledge that the bullying would remain on school grounds. In the 
case of cyberbullying, the victim may receive texts, emails, messages via social networking sites in their safe 
place and at all times. More recent findings by Sevcikova, Smahel, and Otavova (2012) show that 
adolescents (research subject group aged 15-17) perceive that online bullying is entangled with the social 
milieu of school and the victim is aware that a collective group of peers are bearing witness or observing 
the behavior — a common configuration in social media. This may render the feelings of victimization and 
transgression as more intense than if the bullying were enacted in physical space. 
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Both the scholarly literature and popular media reveal that bullying behavior may have lasting and 
devastating consequences. Juvonen, Graham, and Schuster (2003) characterize bullying as a “major public 
health concern facing youth” and describe adjustment difficulties, mental health challenges, and violent 
behavior as among the effects of bullying. Patchin and Hinduja (2010) speak to the relationship between 
cyberbullying and low self-esteem, problematic behavior in school, and family discord. Moreover, as news 
coverage of cyberbullying has made evident, there have been incidents in which cyberbullying victims have 
taken their own lives. Consequently, efforts have been made to combat cyberbullying through intervening 
measures. The literature suggests that these interventions can be classified into three types: 1) law and 
policy; 2) curriculum and campaigns; and 3) technological responses. By employing narrative inquiry to 
elicit user-generated cyberbullying narratives and design solutions, this study explored the third category 
of interventions, technological responses. 


3 Methods 


The study used narrative inquiry, a qualitative methodology that is most commonly employed in educational 
research, to uncover user-generated design interventions for preventing or countering cyberbullying 
behavior. Defined by Connelly and Clandinin (2006) as “the study of experience as story” (p. 477), narrative 
inquiry research is characterized by the narrative serving as both the object of study and the method 
(Connelly & Clandinin, 1990). Our implementation of narrative inquiry departed from the more common 
use of this approach as a means to unveil participants’ lived experiences. Aware of the pervasiveness of 
cyberbullying as a societal problem, our intention was not to have our participants divulge personal, 
sensitive, or traumatic stories to us and their fellow participants. We instead probed our participants’ 
perceptions of the cyberbullying experience as they imagined it would be for someone else, asking them to 
tell us a story about “mean and cruel behavior online." Bowler et al. (2013) provides a further description 
of the procedures used in this study and narrative inquiry as a methodology. 

During the Spring 2012 term, we conducted two storytelling sessions on campus with nine 
participants. The first session was with five undergraduate students — five females in their early 20s, all 
from the University of Pittsburgh. The second session, held one month later, was with four teens: one girl 
and three boys between the ages of 14 to 17. Convenience sampling was used to recruit participants for this 
study. Parental consent, as well as assent from the teens, was received just prior to the start of the 
storytelling session with teens. Parents did not stay in the room during the session. 

The storytelling sessions took place on two Sunday afternoons in a classroom at the University of 
Pittsburgh’s School of Information Sciences. The first session with the undergraduate students lasted three 
hours, with equal portions allocated for sketching and group discussion. The five undergraduates were 
divided into two smaller groups — one group with three participants, the other with two. In order to better 
accommodate our teenage participants, we shortened the protocols with the teen group, focusing more on 
the sketching and less on the group discussion before and after sketching. This second session with teens 
ran for two hours. The four teens were also divided into two smaller groups — one with two boys and the 
other with one girl and one boy. In both storytelling sessions, there were three investigators in the classroom. 
Two investigators interacted with the participants while third investigator observed and took notes. 
Informed by Marwick and boyd’s (2011) findings, we aimed to allow the young people to use their own 
words to label the roles and events in their narratives of online conflict, rather than insert the language 
that is commonly used in mass media and which tends to reflect an “adult” point of view. We asked them 
to simply tell us a story that depicted “mean and cruel behavior online.” In total, the study resulted in four 
narratives telling the story of bullying in social media environments and a set of accompanying design 
interventions that might alleviate or even intervene in mean and cruel online behavior. 
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Figure 1: A cyberbullying narrative. Sticky notes indicate places where a design intervention should be 


inserted into the narrative. 


4 Results 


Four “cyberbullying stories” were constructed by the participants, two by the teen group and two by the 
undergraduate student group (Figure 1 depicts one of the cyberbullying narratives by an undergraduate 
group). While we asked the participants to frame their thoughts about cyberbullying around a narrative, 
their stories do not necessarily follow a traditional narrative arc, with a proper beginning, middle and end 
that is usually in the shape of a resolution of a problem. In fact, we found several of the stories were quite 
post-modern in their messy storylines and ambivalent conclusions, a reflection perhaps of the very nature 
of social media. The complex web of relationships and “storylines,” shifting roles, and ever-morphing 
outcomes in social media seems to preclude a neat and tidy ending. 

After the participants had generated their visual stories illustrating "mean and cruel behavior 
online" they were then asked to think about design interventions that would discourage or prevent such 
behavior. The participants wrote their ideas on sticky notes and then stuck the notes at the point in their 
story at which the design interventions were supposed to work. Present in the narratives and discussions of 
the narratives that "could be" are the following design themes: design for reflection, design for consequence, 
design for empathy, design for personal empowerment, design for fear, design for attention, and design for 
control and suppression. Many of the design features suggested by the participants elicit a range of 
provocations and thus find themselves classified under more than one theme. Table 1 below describes the 
design themes and associated features that the teen and young adult participants highlighted. 


Design Themes Design Principles Design Features 
Design for Reflection Design that creates a pause, slowing Pop-up warning about cyberbullying timed to 
users down so they can consider the last for ten seconds so that users can stop and 
ramifications of their actions. think. Alert boxes with reflective questions 


anytime one clicks “like”, asking the user “why 
do you like this?” 


Design for Consequence Design that ensures that there are Public shaming through a “bully button”. 
consequences for bullying behavior. Facebook-imposed restrictions as a punishment 
for bullying behavior. Reports of inappropriate 
online behavior sent to perpetrator's school. 
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Design for Empathy 


Design for Empowerment 


Design for Fear 


Design for Attention 


Design for Control and 
Suppression 


Design that can make pain and sadness 
concrete, allowing bullies and their 


followers to see how victims suffer. 


Design features that redress an 
imbalance of power. 


Design that harnesses the power of 
fear. 


Design that catches the attention of 
bullies. 


Design that would trigger the suppression 
of content either by Facebook 
administrators or through an algorithm. 


Design affordances such as sad music and 
emoticons. Design features that create a more 


emotive social media environment. 


Adult interventions figure largely in this design 
feature. The system facilitates adult 
interaction, thereby lending the power of adults 
to the victim. Adults post supportive messages 


or warn the bullies that adults are watching. 


A “bully button” and the use of 
personalization, both of which send the 
message that “you’re being watched”. 


Anti-bullying messages that are prominent, 
loud, personalized, and even irritating. Bright 
colors should be used. 


The system alerts Facebook staff when there 
are too many “likes” within a short period of 
time (a clue that something is going viral), 
resulting in the removal of offensive and cruel 
content. 


Facebook-imposed filters for offensive words. 


Table 1: Participants’ Design Recommendations. 


4.1 Design for Reflection 


In this study, the suggested design interventions and the accompanying explanations showed that some 
participants were aware of the value of reflection in countering cyberbullying behavior. One group, for 
example, suggested pop-up messages timed to last for ten seconds (in other words, users cannot close the 
pop-up dialogue box until after ten seconds). The group explained that during the ten second delay, social 
media users would have a chance to read and process the pop-up warning about cyberbullying. While not 
all users would use this mandatory pause to reflect thoughtfully on their own feelings and values, this design 
feature would afford them the opportunity to do so, at least according to some of the participants. 
Interestingly, the downside of this design feature (that it is an irritating intrusion) was not explored. A 
similar suggestion came from the members of one undergraduate group, who proposed the inclusion of alert 
boxes with reflective question prompts that would appear each time a Facebook user attempts to perform 
an activity on the site. When asked what would trigger the alert box, one of the participants explained, 
“Just anytime you click 'like' or something. It’s like, 'why do you like something?’ or 'why do you like this?’ 
Presumably, during this time of prompted reflection, a potential bully could reconsider a posting a harmful 
message or image or a potential bystander could reconsider clicking the “like” button in response to a 


negative comment and joining the bullying fray. 


4.2 Design for Consequence 


In general, the cyberbullying narratives were absent of consequences for the bully. Three of the four stories 
had ambiguous endings and in none of the stories were the “bullies” shown to pay consequences for their 
behavior. The participants expressed a view that in order to solve the problem, it is vital to spread awareness 
that there would be certain consequences for individuals engaging in cyberbullying. Among the consequences 
identified by the participants were public shaming, getting in trouble with the school (and having to visit 
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the guidance counselor or principal), getting in trouble with the police and the law, and imposed Facebook 
restrictions. 

Perhaps the most provocative suggestion was the addition of a bully button, to be activated when 
a damaging photo of the bullied individual was actively shared and liked. It would allow people to flag a 
bullying situation with comments such as, “REALLY mean comment”. The participants thought bullies 
would avoid the public shaming that would occur through the accumulation of “bully” points. Social media 
users, they suggested, would be afraid of being labeled a “bully” and avoid this consequence by refraining 
from engaging in acts that may be perceived (and publically dubbed) as bullying behavior. The participants 
believed that in avoiding this consequence, the users would refrain from engaging in cyberbullying behavior. 
One of the undergraduate participants compared the Bully Button to the act of “liking” something on 
Facebook, explaining “Like you can see how many people like something, you can see how many people 
bully button a comment.” She continued to say, “It’s kind of embarrassing...If I have twenty bully buttons 
next to my comment, it’s like ‘you’re a big jerk.’ ” 


4.3 Design for Empathy 


The young people in this study felt that it was important for bullies and their followers to see how victims 
suffer. They assumed that a design feature that could make pain and sadness concrete would lead to bullies 
self-regulating their bullying behavior. Thinking specifically about Facebook, one group suggested more use 
of sad posts, sad songs, or emoticons by the victims on Facebook, believing that bullying wouldn’t “happen 
if the bullies realized they’re wrong...” This realization would come if “bullies can actually see that they’re 
causing this pain...” While Facebook posts, sad music, and the use of emoticons are not new design ideas, 
it does suggest the need for a more emotive social media environment and begs the question as to whether 
there is a better way to bring these elements to the fore, to make them more apparent and impactful to 
others viewing the social media spaces of those who are bullied. 


4.4 Design for Personal Empowerment 


An imbalance of power seems to be inherent in cyberbullying and the participants suggested that the bullied 
victims need some power of their own. Bullies and their supporters have on their side the wild 
encouragement from within their circle, the anonymity of their acts, a lack of consequences, and the speed 
of networked, digital information for disseminating online bullying behavior. What power does the bullied 
have and how can the victim and his or her defenders’ power be embodied in design? Anonymity was a key 
theme here. Showing support for the bullied is predicated on the supporter having the safety of anonymity. 
Anonymity is therefore power. The "Bully Button," for example, would allow for observers to come to the 
defense of the victims by anonymously calling attention to bullying behavior. 

To young people, adults often represent power, but adults took an active role in defending the 
bullied in only one of the narratives. Interestingly, the anonymity of adults was not seen as beneficial. 
Indeed, knowledge that adults can lend support to the victim was exactly the point. One group of teens 
described a cyberbullying story where, rather than deal head-on with bullying, adults would instead “like” 
Joe, the victim, in order to protect his self-esteem. They would also respond directly to the bullies with 
posts in Facebook like, “That’s not very nice — I don’t appreciate you saying something mean about him.” 
By sharing and extending their power into the social media environment, adults would empower the victim. 


4.5 Design for Fear 

The participants made several suggestions that seemed to harness the power of fear as a means to deter a 
bully and, in doing so, support the victim. Not to be confused with Design for Reflection (design that 
affords the cognitive practice of thoughtful introspection), Design for Fear activates mechanisms that might 
cause social media users to feel caution before acting. The proposed design features seem to suggest that, 
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at least according to the young people in this study, social media that affords fear might actually be a useful 
feature. 

In one of the teen group’s stories, the victim's defender turned into a bully and the original bully 
into a victim. To deal with this never-ending saga of fighting, the boys in this group suggested that pop-up 
messages should appear in Facebook with an ominous warning: “Stop bullying today or you could be next, 
Ricky” (Figure 2). This cryptic language, directed at a specific target, sounds more like a threat than a 
warning, and draws upon fear as a potential prompt for hesitation. It was not explained exactly how the 
system would know that the person is bullying, but more interesting was the use of personalization to get 
the user’s attention. Personalization, as the teens explained, gets your attention because you know the 
message is designed for you. Clearly, you are being watched and knowing that someone is watching you is 
frightening. 


4.6 Design for Attention 


Participants suggested that important anti-bullying messages need to be prominent, loud, and even 
irritating, in order to be noticed by the “bullies.” Personalization would also catch the attention of people. 
The threatening message, “Stop bullying today! Or you could be next, Ricky,” would work not only because 
of the fear it would engender, but because it was “tailor-made.” As a male teen explained, “If it says your 
name in there, you're definitely going to notice it...Like, you’re going to read it and be like ’Oh wait, that 
says my name.’” There was little concern about what personalization would mean for privacy. 


4.7 Design for Control and Suppression 


The participants proposed designs that would trigger the suppression of content either by Facebook 
administrators (some participants assumed that Facebook employees, not algorithms, were making decisions 
about individual posts) or through an algorithm. Three groups mentioned filtering for “rude words.” One 
group of undergraduate females in their early 20s, was more specific about the kind of language that should 
be filtered, saying words like “slut” and “whore” should automatically be flagged and reviewed by Facebook. 
This same group also thought that images should be actively filtered by Facebook. An image that received 
200 “likes” in a short period of time should raise an immediate red flag for the social media system, since 
(at least according to this group’s cyberbullying narrative) mean and cruel behavior happens in a rapid 
surge of online postings. While one group of undergraduates thought there should be an algorithm built 
into the system to catch certain language and trending images, the system response they suggested was 
entirely human - someone at Facebook should to look at the flagged messages and images and make a 
determination as to their appropriateness. As one participant said, “If there are more than like X number 
of 'likes' on a certain picture, the Facebook staff could look at it and be like ‘ok, we’re going to look at this 


2 


and decide is it a good thing or a bad thing. 


F. aed 


Figure 2: Design for Fear/Design for Attention: A personal message sends a threat and gets a bully’s 
attention: “STOP BULLYING TODAY! Or you could be next, Ricky” 
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5 Discussion and Conclusion 


The participants in this study took their task seriously, suggesting a wide range of design themes - design 
for reflection, design for consequence, design for empathy, design for personal empowerment, design for fear, 
design for attention, and design for control and suppression. Collectively these design themes give the bullied 
and their supporters a range of active and passive tools that might pre-empt or push back against mean 
and cruel online behavior. It seems, at least to the participants, that well-being in social media environments 
is integrally linked to a balance of power between the bullied and the bullies. 

Several of the design themes offer a clear acknowledgement of the important role of the bystander 
and the inherent social nature of bullying in social media (Design for Consequence, and Design for Personal 
Empowerment, for example). According to the narratives in this study, it is simply not possible to 
disentangle the bully, the bullied, and the crowd that “watches” the story unfold. These are fluid, changing 
roles. The message from the participants is that if we are to design for well-being, at least in terms of social 
media, we need to move from a view of bullying as a dichotomous relationship between the victim and 
bully, toward a broader, community-wide conception of the problem. This may seem obvious, given that 
social media is inherently social, but many approaches to the prevention of cyberbullying seem to focus 
overwhelmingly on the relationship between the victim and bully — who bullies, who gets bullied, what to 
do, how to avoid, and who to talk to — and less so on the active roles that bystanders can play. 

Power is a pivotal theme that weaves its way throughout several of the design features, suggesting 
that well-being in social media environments is integrally linked to a balance of power between the bullied 
and the bullies. Empowering the victims by giving them tools to push back (some might say tools that give 
the bullies a taste of their own medicine) seems, according to some of the participants, important to healthy 
social media spaces. While not raised by the participants themselves, an interesting consideration for 
designers would be the consequences of social media spaces that even out the balance of power: Would this 
design recommendation result in a “cold war” style standoff or an escalation of bullying? 

As with all design, there is the intent and the unintended consequence. The participants, laser 
focused on the task of ending cyberbullying, did not seem to consider the downside of their design 
suggestions. While this may have reflected the shape and protocols of the study, it may also reflect a gap 
in their understanding of the complexity of social media and the very idea that design has consequences. 
For example, the Bully Button, an aggressive tool for pushing back against the bullies and their henchmen, 
could quite easily lead to more bullying. Like the scarlet letter “A” that branded Hester Prynne an adulterer 
or a stockade in a village square that marked an individual a criminal or sinner, the designs for a Bully 
Button would allow for the punishment of the individual by the community. 

Several of the design features suggested by the participants clearly targeted affective aspects of 
design (Design for Empathy and Design for Fear in particular) and raise some interesting questions. When 
designing for empathy, the participants assumed that showing the suffering of the bullied would lead to an 
empathetic response from the bullies and result in end of bullying. This is not always the case. It is important 
to note that there are two forms of empathy — cognitive and affective empathy (Karem et al., 2001). 
Cognitive empathy is knowing how other people feel while affective empathy is sharing other people’s feelings 
(or, as the saying goes, feeling their pain). Bullies can know about someone else’s pain but that may not 
change behavior if the bully or bullies simply don't care. Bullies can have cognitive empathy and use it in 
Machiavellian ways to bully their victim in more perfect ways. So the design problem might not be to design 
for empathy but to design for caring. 

Design for Fear raises another concern. Fear is a powerful emotion and, in an evolutionary sense, 
humans are designed for fear. But there is an enormous amount of discomfort associated with fear, which 
begs the question: how does the system (in our study, Facebook) balance the fears and anxieties of the 
bullied with those of the bullies? At what point does causing someone to hesitate become fear-mongering 
and harassment and thereby derail the point of design for well-being? At what point does this design cease 
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to create a healthy environment for users of social media? This question was not raised by the participants 
in the study. 

Perhaps the antithesis of designing for fear is Design for Reflection. This includes design 
interventions that encourage quiet and introspective thought. Akin to Hallnäs and Restrém’s “slow 
technology” (2001), Design for Reflection is meant to encourage mindful, self-aware online behavior. It is 
interesting that while both the teen and undergraduate groups said that social media users should take steps 
to pause and think before acting, it was only the undergraduate students (young people in their early 20s) 
who suggested a specific mechanism (pop-up question prompts) in their first attempt at the “design” 
activity. While the younger participants were aware that users should stop and think before acting, they 
initially did not identify a specific strategy to enable such behavior. It was only with deeper probing by the 
investigators that one teen group did ultimately suggest a design feature, the pop-up warnings about 
cyberbullying. This may point to a developmental or experiential difference vis à vis the use of social media. 
Future research on designing for well-being should explore the relationship between age, experience, and 
design recommendations. 

In this study we asked participants to think about what social media might look like if it was 
designed to prevent or intervene in mean and cruel behavior. The design interventions suggested by the 
participants reflect their lived experiences, perceptions, and values related to social media environments. 
We are not arguing that all the design interventions suggested by the young people in this study be 
actualized into real designs. Rather, what we present here is a window into these young people’s fears, 
anxieties, values and ethics related to mean and cruel online behavior and their proposals as to how to curb 
it. Young people have an earnest desire for social media environments that encourage wellbeing. Designers 
should take from this framework lessons about the expectations that young people hold in relation to social 
media and build social media environments that reflect these needs. 
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Abstract 

In the early 21st century, societies and their governments around the world have been meeting 
unprecedented challenges, many of which surpass the capacities, capabilities, and reaches of their 
traditional institutions and their classical processes of governing. Among these challenges are the need 
for an accelerated transition of the global economy from its current fossil fuel basis to renewable energies, 
the so-called post-carbon era also known as the third industrial revolution, the containment and reduction 
of government spending and debt financing, the increasing rapidity of market changes, and the expanding 
lag of timely interventions via traditional lawmaking and government action. While upholding the proven 
principles of Western democracy, democratic self-governance in 21st century market economies 
apparently needs to develop new institutional formats and novel mechanisms for staying abreast with 
the systemic dynamics of a tightly interconnected global society. We claim that actionable and 
omnipresent information along with its underlying technologies are substantial prerequisites and 
backbones for developing models of smart (democratic) governance, which foster smart, open, and agile 
governmental institutions as well as stakeholder participation and collaboration on all levels and in all 
branches of the governing process. We present and discuss an agenda for research and practice, which 
advances the concept of smart, open, and participatory government of the 21st century. 
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1 Introduction 
This article is conceptual in nature, and its purpose is to spark interest and help shape an ongoing dialogue 
on the complex subject matter of smart governance as a foundation to smart, open, and participatory 
government among and between scholars and practitioners. In so doing, this contribution is rooted in the 
domain of electronic government research (EGR), which over that past decade and a half has produced a 
sizable and highly regarded body of academic knowledge of significant relevance to practice. EGR is a multi- 
disciplinary strand with contributions from information science, information systems research, public 
administration, computer science, and other disciplines. Whereas the dedicated scholarly community of 
EGR has grown into 4-digit numbers, this article tries to reach beyond this community and also interest 
scholars from adjacent and other information-related fields. 

In the early 21 century, governments and publics have been confronted with several unprecedented 
challenges, which are complex and intertwined [61]: 


(1) The Third Industrial Revolution has begun to convert the basis of industrial activity from fossil 
fuels to renewable energies [57]. At the same time and in order to cope with increasing demand, the 
uses of primary energies need to become far less wasteful than during the waning carbon era. 
Fundamental part and strong co-driver of the Third Industrial Revolution is the concurrent 


informational revolution, which facilitates the immediate availability and ubiquity of actionable 
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information via computer-mediated networks in all aspects of economic activity as well as in public 
and private life [61] leading to highly effective and efficient economic exchanges and societal 
interactions. 
(2) The Rapidity of Change and the Lack of Timely and Effective Intervention have caused several 


severe global and regional crises, for example, the 2008 financial meltdown in the United States of 
America. These crises have proven to surpass the steering, intervening, and counterbalancing 
capacities of national governments by a significant margin. The old mechanisms of relatively slow 
regulation and deregulation as well as the deliberately moderate processes of making, enforcing, and 
interpreting the law have proven less then ever effective when faced with rapid changes and 
developments [61]. New, more intelligent, and regionally/globally effective mechanisms need to be 
found, which preserve the principles of democratic process but nevertheless cope with the pace of 
change [61]. 

(3) Expansive Government Spending and Exorbitant Public Debt Financing have been blemished for 
eroding the stability and long-term sustainability of whole societies and nation states [25, 69]. While 
so far no evidence has been produced that the two phenomena are causally linked, taken together 
they present a huge challenge for a large number of nation states curtailing or even impeding the 
traditional ways of public policy making via debt-financed spending. The inevitable structural 
reform of public finances, however, has come at the oddest point in time possible given the other 
challenges, and it seriously complicates the search for adequate solutions. 


While interacting and when left unaddressed, these three challenges would most likely produce highly 
negative impacts on societal wellbeing and the collective and individual qualities of life in the 21* century. 
In some nations, exploding health care costs and rapidly aging populations might even exacerbate the 
situation. Before this background the question has been raised, how these multi-level challenges can be 
addressed, one at a time as well as conjointly due to their interdependencies and interactions [61]. 

With this article, we do not claim being capable of charting out potential paths to a comprehensive 
solution to a highly complex and dynamic problem. However, we assert that a reformed model of democratic 
self-governance, which rests on the principles of Western democracy and maintains its tradition, will play 
a major role in finding such paths. Moreover, a reformed model of Western democracy is both necessary 
and feasible. We call them the New Models of Participation and the Evolution of Smart and Open 
Government in the 21* century. As key facilitators of these new models of participation and smart and open 
government we envision ubiquitously available, symmetrically shared, and immediately 
actionable information based on and provided by modern information technologies allowing for smart 
(democratic self-) governance of society. On a high level of abstraction our research questions, hence, read, 


(RQ #1) What are elements of smart governance and smart and open government, and how might 
they interact? 


Based on the results of RQ#1, it follows, 


(RQ #2) What research and practice agenda would logically support the development of smart 
governance models as well as the evolution of smart and open government? 


In the following, we first briefly review the extant literature on smart governance and smart and open 
government. Addressing RQ#1, we then discuss the elements of smart governance and their interactions. 
Based on this discussion, we present a research and practice agenda capable of advancing the evolution of 


smart and open government. Finally, we present our conclusions and aims of future research. 
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2 Literature on Smart Governance and Smart and Open Government 


Despite the claim that the first Obama administration in 2009 was first to introduce the notion of open 
government, the concept was presented and discussed elsewhere far earlier [11, 19, 20]. In the US, major 
legislative elements of open government were put in place as early as in the mid-1960s, for example, the 
Freedom of Information Act of 1966 with its various amendments over the decades including the Open 
Government Act of 2007. However, the Administration’s open government initiative of 2009 marked a 
radical switch from reactive and lackluster information provisioning to proactive information sharing by the 
federal administration [42, 58, 64, 65]. This paradigmatic shift sparkled the launch of numerous similar 
initiatives at local and State levels in the US as well as in other countries around the world [30]. It also 
reinforced the attention of academic scholarship as evinced by the greatly increased number of published 
studies on the subject ever since. 

The aim of the initiative, which was formally enforced via an Executive Office directive, was to 
provide transparency to government decision-making, improve accountability, and foster collaboration and 
stakeholder participation [55]. Practically, the directive required from departments and agencies to make 
publicly available all unclassified government records in electronic form. However, it also requested from 
each department a detailed plan for collaboration with and participation of other stakeholders including 
businesses and citizens. Direct involvement and participation in government service provision and decision 
making were understood as integral nodes in a feedback loop that safeguarded against falling back into non- 
open government practices [50, 55, 64, 65]. The effects of such open government initiatives have been studied 
since, for example, [31, 34, 44, 46, 47, 50] including the metrics and processes for measuring the success of 
such initiatives [14]. In terms of participation, government-related social media and social networking studies 
have also mushroomed over the years since 2009, for example, [12, 13, 15, 16, 21, 28, 29, 35, 43, 48, 49, 67, 
71], some of which, however, found the extent of influence from participation rather minute [35, 50]. 

Along with the notion of open government appeared the concepts of “lean” and “transformational” 
government, which integrated the ideas of third-party service co-provision, high-leverage of government 
funding via information and communication technologies (ICTs), and gradual service and process 
improvement via recurring experimentation [37]. “Doing more with less” [37] has become the mantra of 
budget-squeezed government agencies around the world, and electronic government practices in general 
have shown quite some potential for effectively supporting such ends [10, 17, 39, 60, 66]. 

In parallel and rather independently, another strand of practice and research dedicated to local 
electronic government has emerged that developed the notion of a smart city and in close conjunction with 


it Smart City Government. A smart city as an urban space would have the characteristics of a culture of 


innovation [32, 36, 52], a high quality of life also referred to as “livability,” global competiveness and 
attractiveness, security, and safety, as well as economic and environmental sustainability [7, 22]. A smart 
city would have a smart City government, which manages and implements policies towards those ends by 
leveraging ICTs and institutions and by actively involving and collaborating with stakeholders [7, 8, 51, 
52]. Early empirical studies on smart city initiatives indicate that despite some peculiarities and differences 
between the initiatives the principles of open, transparent, and participatory government appear to be 
integral part of those initiatives [7, 8, 22, 51]. 

The earliest mention of the combined terms of smart and government that we were able to find 
dates back to a short World Bank report on civil service reform [54]. The term was also used without the 
introduction of a formal definition in a report on the computerization of government operations in the 
Indian State of Andhra Pradesh [68]. More recently, former US president Bill Clinton utilized the term in 
the presentation of his views on the future role of government [23]. Last, one of the core conferences in 
EGR, the Digital Government Society’s dg.o 2013 conference was held under the motto of “From e- 
Government to Smart Government” (http://dgo2013.dgsna.org). 
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In contrast to these rather vague uses of the term, the concept of smart governance has received 
more attention and formal academic treatment. Smart governance, according to Wilke (2007, p. 165), “is 
an abbreviation for the ensemble of principles, factors, and capacities that constitute a form of governance 
able to cope with the conditions and exigencies of the knowledge society” [72]. The author further 
acknowledges that smart governance is about “redesigning formal democratic governance” while maintaining 
the historically developed democratic principles and a free market economy [72]. Smart government, hence, 
has to cope with (a) complexity and (b) uncertainty, and by so doing, has to (c) build competencies and 
(d) achieve resilience [72], the latter two of which have also been referred to as smart governance 
infrastructure, which is seen as an agglomerate of hard and soft elements such as norms, policies, practices, 
information, technologies, skills, and other resources [38]. When developing smart governance 
infrastructures, several key factors have been identified such as problem focus, feasibility/ implementability, 
stakeholders’ contributability, continued engagement, coordination, and access to open data and shared 
information [38]. 

In summary, so far the two concepts of smart governance and smart government have only been 
rudimentarily developed. While the former has recently caught some academic attention along with some 
foundational theoretical treatment, the latter has not been conceptually developed although component 
elements such as openness and transparency of government decision-making and actions, open information 
sharing, stakeholder participation and collaboration, leveraging government operations and services via 
intelligent and integrated technology use, as well as government’s role of facilitator of innovation, 
sustainability, competitiveness, and livability seem to converge to a unified concept of smart and open 
government. Obviously, also, smart government rests on the foundation of smart governance suggesting 
that both concepts are closely related. Neither one concept has been empirically studied in any 
comprehensive way. However, practitioners have begun employing in projects many elements of both smart 
governance and smart and open government. Henceforth, it appears adequate to focus academic attention 
to the further development of the two concepts, so to benefit both practice and academic discourse on the 
two interrelated phenomena. 

As a reminder, this contribution is conceptual. That is, it pursues the aim of developing a clearer 
and expanded academic understanding of, in general, a phenomenon of interest, and in this case, how 
societal wellbeing and livability in the 21* century can be maintained before the background of three major 
and intertwined challenges portrayed above. In so doing, a concept paper connects related elements, which 
are already known or have already been proposed for study, and puts them into the particular context of 
interest explaining and discussing how the phenomenon can be studied in the given context, and why it is 
important to better understand it. As we asserted above, the evolution and active development of smart 
public governance and smart and open government are interdependent and appear as essential responses 
when addressing the three challenges to societal wellbeing and livability in this century. Along these lines, 
we next discuss the two research questions posed above. 


3 Smart Governance and Smart and Open Government 


3.1 What are the elements, and how might they interact (RQ#1)? 


We follow Wilke (2007) that for meeting complexity and uncertainty, respective competencies need to be 
developed, and a resilient governance environment needs to be created [72]. Resilience has been defined as 
a “process linking a set of adaptive capacities to a positive trajectory of functioning and adaptation after a 
disturbance” [53], see also [18, 24]. In other words, the competencies need to be adaptive and capable of 
serving in a process of coping with complexity and uncertainty. Johnston and Hansen’s (2011) enumeration 
of infrastructural elements of smart governance (norms, policies, practices, information, technologies, skills, 
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and other resources) [38] provide further details of the process elements, which need to be adaptive in order 
to provide for resilience. 
Relative to the overall goal (“Preserving and developing societal wellbeing and livability in the 21* 


99 06 


century’) and the three challenges to reaching that goal (“third industrial revolution,” “rapidity of change 


4 


and the lack of timely and effective intervention,” and “expansive government spending and exorbitant 
public debt financing”), we found evidence that eight select areas have been put into focus and are likely 
candidates for smart governance initiatives (for each area we briefly use examples, current issues, and key 
points from the German federal government and German research centers for illustration, although we could 


have likewise used sources from elsewhere): 


1. Budgeting/controlling/evaluating. Example: Under the title “Growth-friendly consolidation” the 
German Federal Ministry of Finance details a multiyear approach of shrinking government spending 
while maintaining high levels of governmental investments in growth-related and future-oriented 
areas [9]. 

2. Electronic government/administrative modernization/process streamlining. Examples and issues: 
the German e-government (EGOV) law (eGovG) postulates simplified and reliable administrative 
processes, needs orientation, economic efficiency, ecological sustainability, modular and adequate 
ICT support, and a leading role in EGR [1, 4]; however, despite these high aspirations and its 
economic weight, Germany ranks only 17 in the most recent UN EGOV rankings [3]. 

3. Security and Safety. Examples: Responding to the sensitivities of the electorate, German 
governments at all levels have traditionally upheld relatively high standards with regard to data 
security, privacy, and data parsimony [40]. So far, the focus has been on secure and confidential 
uses data [5]. However, these practices might need review and reformulation in terms of open data 
initiatives (see below, 8.), with which they may create tensions. 

4. Infrastructure Overhaul and Ubiquitous High-speed Connectivity. Examples and key points: 
Germany hosts a number of smart grid projects (in sectors such as energy, traffic, and everywhere 
gigabit Internet) [56]. The latter is badly needed, since the country ranks only 19" worldwide in 
terms of average Internet bandwidth [6]. 

5. Electric Mobility. Example: the German Federal government embraced the notion of electric 
mobility, which would convert individual traffic from fossil fuels to electricity in the long-term, 
relatively early [2]. 

6. Participation and Collaboration. Examples and key points: Social media and social networking uses; 
individual information services; active and individual involvement [5]; fostering individual 
contributions from citizens and showcasing the effects of contributions [1]. 

7. Open Data / Big Data Provision and Use. Examples and key points: Provision of accurate, 
comprehensive, and reliable information [5]; transparency of data uses; accounting for the 
effectiveness of participatory contribution [1]; currently, open data initiatives in Germany are only 
partial and selective [40]. 

8. Open Government, Transparency, and Accountability. Examples, issues, and key points: Although 
closely related to open data, open government goes beyond the mere provision of government data, 
it rather has to encompass a proactive involvement of stakeholders in the public decision making 
processes [45]. Transparency appears as a key to effective administration of the 21* century as well 
as to the legislative process [1]. An urgent need for significant research on the subject has been 
identified [45]. 


In summary, these eight areas seemingly address all three aforementioned challenges, at least to a 
mentionable extent: For example, all eight areas appear to either directly or indirectly address the challenges 
of the third industrial revolution, in particular, in the areas of infrastructure overhaul, ubiquitous high- 
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speed connectivity, electric mobility, and administrative modernization. The areas of strict budget controls 
and evaluation, transparency, and open data directly address the challenge of expansive government 
spending and debt financing whereas the challenge of rapidity of change and lack of timely and effective 
intervention is directly addressed in all areas except for electric mobility. Based on this understanding, we 


next turn to addressing the second research question. 


3.2 What research and practice agenda would logically support the development of smart 
governance models as well as the evolution of smart and open government? (RQ#2) 


When cross-tabulating the smart governance elements [38] with the eight areas of focus as presented in the 
previous section, a roadmap for both practice as well as for research emerges (see schematic Table 1 in the 
appendix). For space reasons, we refrain from presenting all fifty-six cells of the smart governance grid in 
detail but rather focus on a few for illustration purposes. 

Norms. In the area of budgeting/controlling/evaluating, for example, new and smart standards 
need to be developed, and new and more intelligent budgetary algorithms need to be found and tested. As 
discussed elsewhere [61], current spending levels and debt financing schemes cannot be maintained. As the 
above quoted example of the German Finance Ministry shows, investments in select and promising growth 
areas at the right time and in volumes of critical mass might be one possible path. As a principle, while 
spending levels remain capped or reduced, sizable, continued, and focused investments are still made. The 
question, which cannot be answered as of yet, is to what extent and how long should focused investments 
be continued? Further, what are measures of success or effectiveness? Also, what are the consequences of 
divesting (or under-investing in other areas)? What are acceptable review cycles and review participants? 
How can smart norms be further developed or changed? Who decides? How and when is stakeholder input 
used? As an example for a highly unintelligent approach to the government spending/debt financing crisis, 
we name the sequestration approach, as it was practiced in 2013 fiscal year in the US. While the across- 
the-board 20-percent cut of the federal budget, in fact, significantly reduced the spending column of the 
balance sheet, it did so without any discrimination, leaving areas of strategic growth potential under- 
invested. 

Policies. Smart policies have the characteristics of both sustainability and adaptability [33, 59, 70]. 
These two characteristics are critical when it comes to addressing the challenge of rapidity of change and 
timely and effective intervention. As outlined before seven of eight areas either directly or indirectly address 
this challenge. In the area of infrastructure overhaul and ubiquitous high-speed connectivity, policy obstacles 
may arise when traditional business models such as those of large energy providers or telecommunications 
providers are at stake, or even potentially disrupted. Internet providers in the US, for example, show little 
enthusiasm to use public grey channels on existing public fiber-optical and make available gigabit 
connectivity to the premises at lowest cost. It appears that some oligopolistic business models benefit from 
managing scarcity and shortage rather than from providing abundance of bandwidth [26]. Smart governance, 
and, in particular smart policymaking needs to strike a balance between protecting old business models and 
paving the path to coping with rapid change. What can be drivers and enablers for overcoming oligopolistic 
and monopolistic resistances to change? What coalitions can be formed to foster smart policies? How can 
the effectiveness of policies be monitored and measured? What are successful policies within smart 
governance? What models have been observed, and what lessons were learned? What are the elements of a 
smart policy development process? When do smart policies lose their effectiveness? As pointed out before, 
smart policymaking pertains to all areas, and shapes both the overall smart governance models as well as 
the institutional and administrative settings and enactments of smart and open government based on smart 
governance models. 

Practices. Smart practices apply to all eight areas of focus. However, as an example, in the area of 
ICT-induced administrative modernization and streamlining, also more popularly referred to as electronic 


168 


iConference 2014 Hans J. Scholl & Margit C. Scholl 


government, a tradition of current-practice information sharing via practitioner exchanges or 
practice/academia exchanges has developed over the years (for example, via practitioner portals such as 
www.govloop.org or https://www-.nascio.org ). Such current practice-related exchanges would clearly also 
apply to smart practices. Furthermore, academic research has played an important role in influencing and 
shaping the evolution of electronic government through frequent exchanges as well as action research 
projects. This interaction between academia and practice will be equally important in the area of smart 
governance practices as well as practices of smart and open government. Both practice and academia would 
help identify, for example, what practices, if any, are characteristic for the development and realization of 
smart governance as well as for smart and open government? Further, what makes such practices smart? 
What practices can be transferred from one context to another? How can the effectiveness of smart practices 
be monitored and measured? What limitations do exist? This list of research and practice-related questions 
with regard to smart practices is, of course, not exhaustive. 

Information. The kingpin of smart governance and its enactment in terms of smart and open 
government is shared, timely, and actionable information, which is fundamental in all eight areas of focus. 
Information sharing has been touted as quintessential for inter- and intra-governmental collaboration as 
well as for government-to-citizen and government-to-business interaction [27, 41, 63]. As pointed out above, 
timely and actionable information, once open and shared, also provides for transparency, accountability, 
and stakeholder participation. In that capacity, shared information is also the indispensable prerequisite for 
smart governance. Research and practice-related questions include: What are enablers and obstacles for 
information sharing? What quality of information is needed for enabling smart governance? How can 
context-relevant, timely and actionable information be distilled from an ocean of open big data? What 
information visualization approaches can be used, and how effective are they? What information-sharing 
policies are needed for enabling and maintaining smart governance? How can information asymmetries be 
detected? What information should be open, what should not be open, and why? What are acceptable 
balances between the need-to-know and individual privacy? What are the constitutional, legal, and practical 
limitations to government surveillance of global digital traffic, and how can those limitations, if any, be 
either overcome or enforced? 

Information, Communication, and Other Technologies. ICTs and other related technologies have 
become core facilitators of the information revolution, which in itself is both the engine and the backbone 
of the third industrial revolution as discussed above. In the context of smart governance, ICTs and other 
technologies play highly critical roles as they technically facilitate the “smartness” of governance, and 
consequently, government. In that sense, they apply to and permeate all eight areas of focus. Their absence 
or malfunction, even temporary, can strip entire organizations and processes from regular functioning within 
the nick of time. ICT ubiquity and highest availability have become the normal case and expectation even 
in remote and barren environments. ICTs and other technologies have helped redefine and redesign 
traditional formats of process and structural organization. They have also made possible completely novel 
processes and structural formats. In smart governance, research and practice-related questions may include: 
What new processes and formats can be facilitated via ICTs and other technologies? What traditional 
processes and formats can be replaced, streamlined, and redesigned by the use of ICTs and other 
technologies? What are the impacts of such changes on the models of smart government and smart and 
open government, respectively? What are desired outcomes of ICT-induced changes, and what are undesired 
outcomes of such changes, and why? What are the policy implications of the accelerated proliferation of 
ICTs and other technologies? 

Skills and Human Capital. Smart governance, which relies and rests on timely and actionable 
information as well as the underlying facilitating ICTs, requires human skills capable of bringing the 
component parts of smart governance into action and interaction. Besides technological savviness this 
necessitates the understanding of process, policy, and people when developing and maintaining models of 
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smart governance. Educational and developmental programs, hence, need to be integrative creating fused 
high-level literacy in technical areas and non-technical areas of academic and professional development 
alike. Smart governance can flourish when the old schism vanishes that divided the business side of an 
organization from its ICT side. Research and practice-related questions encompass: What sets of skills need 
to be developed and combined for enacting smart governance models and smart and open government? How 
frequently are educational and developmental updates to such skills necessary? What educational formats 
are most effective and economic? What are the necessary levels of investment in the development and 
maintenance of human skills? What are the measurable consequences of continued under-investment in the 
development and maintenance of human skills? 

Other Resources. Beyond the identifiable components discussed above, smart governance might 
require additional resources in any area of focus, which may emerge along the lines of development and 
practice? The research and practice-related questions, hence, include: What other resource are necessary to 
develop and maintain models of smart governance? What are their characteristics, and how do they 
contribute to the overall outcome? Why are they important, and how critical are they? How can they be 
replaced or emulated, if inaccessible? How can they be identified ex ante? 

In summary, when cross-tabulating the elements of smart governance with the areas of focus as 
addressed in early smart governance initiatives, it becomes clear that a whole host of research and practice- 
related problems need to be better understood. Academic research can effectively support the evolution of 
smart governance, and with it, smart and open government, in practice. Academic research can in particular 
accelerate the learning process and implementation by systematically sharing the results of studies across 
all elements of smart governance. This will predictably lead to sounder and more elaborated models of smart 
governance than when such initiatives are left to trial-and-error approaches in practice alone. Since quite a 
few smart governance initiatives are in their early stages, research, including (participatory) action research, 
should accompany such initiatives and should be funded as a integral part of smart governance as well as 
smart and open government projects. 

In this context, we would like to point out that in applied research, in general, with EGR being no 
exception, a tendency was found to mostly focus on desirable and successful project outcomes [62]. However, 
confining the study of smart governance that narrowly runs the risk of neglecting important lessons learned 
from failure and undesirable project outcomes. Two types (A and B) have been identified for warranting 
scrutiny and study [62], see Figure 1. We propose to also focus research on smart governance/smart and 
open government projects with outcomes of type A  (desirable/unsuccessful) and type B 
(undesirable/successful). As a case in point for a type B project outcome, the wholesale surveillance of 
Internet protocol-based global digital traffic by National Security Agency and other agencies elsewhere 
might be cited, although admittedly the assessment of this outcome’s desirability or undesirability might 
vary depending on the stance of the beholder. 
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Figure 1: Problem Outcome Matrix — Type A and Type B Outcomes are understudied 


4 Conclusion and Future Research 


It has been the object of this article to make the case and present a roadmap for the study of the phenomena 
of smart governance as well as smart and open governance as an enactment of smart governance in practice. 
As a concept paper, this contribution aimed at sparking interest and at inspiring scholarly and practitioner 
discourse in this area of study inside the community of electronic government research and practice, and 
beyond. The roadmap presented here comprises and details seven elements of smart governance along with 
eight areas of focus in practice. 

Smart governance along with its administrative enactment of smart and open government, it was 
argued, can help effectively address the three grand challenges to 21* century societal and individual well- 
being, which are (a) the Third Industrial Revolution with the information revolution at its core, (b) the 
rapidity of change and the lack of timely and effective government intervention, and (c) expansive 
government spending and exorbitant public debt financing. Although not seen as a panacea, it was also 
argued that smart governance principles could guide the relatively complex administrative enactment of 
smart and open government more intelligently than traditional static and inflexible governance approaches 
could do. 

Since much of the road ahead metaphorically speaking leads through uncharted territory, dedicated 
research is needed that accompanies projects in this area and evaluates them. Research could further be 
embedded into practical projects providing for fast and systematic learning. We believe that such embedding 
of research into smart governance projects should become an integral part of smart projects’ agendas. 

Finally, in Figure 2 we summarize the context and trajectory as well as the main areas of the smart 
governance and smart and open government evolution: Emanating from traditional electronic government 
research, smart governance research will encompass broader fields of interest such as smart administration, 
smart interaction with stakeholders, smart security and safety, and smart infrastructures, which in turn are 
enclosed in the larger contexts of 21* century society and environment. 
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Figure 2: The Trajectory from E-Government Research to Smart and Open Government Research 


5 References 


"National E-Government Strategy: IT Planning Council decision of 24 September 2010." vol. 2013: IT- 
Planungsrat, 2010. 

"German Federal Government’s National Electro-mobility Development Plan." vol. 2013: Federal Ministry 
of Transport, Building, and Urban Development ("Bundesministerium für Verkehr, Dau und 
Stadtentwicklung"), 2009. 

"United Nations E-Government Development Database (UNeGovDD)," 2012 ed. vol. 2013: United Nations 
Public Adminstration Programme, Division for Public Administration and Development 
Management (DPADM), UN Department of Economic and Social Affairs (UNDESA), 2012. 

"E-Government-Gesetz (in German, e-Government Act)." vol. 2013: Federal Minitry of the Interior 
("Bundesinnenministerium"), 2013. 

"Fields of innovation of the digital world: Needs of the day after tomorrow (in German, "Innovationsfelder 
der digitalen Welt: Bedürfnisse von ttbermorgen"," Future Study ("Zukunftsstudie'), vol. V, ed. vol. 
2013: Miinchner Kreis, 2013. 

"The state of the Internet, 4th Quarter, 2012 executive summary," volume 5, issue 4, ed. vol. 2013: Akamai 
Technologies, 2013. 

S. Alawadhi, A. Aldama-Nalda, H. Chourabi, R. J. Gil-Garcia, S. Leung, S. Mellouli, T. Nam, T. Pardo, 
H. J. Scholl, and S. Walker, "Building Understanding of Smart City Initiatives," in Electronic 
Government. vol. 7443, H. J. Scholl, M. Janssen, M. Wimmer, C. Moe, and L. Flak, Eds.: Springer 
Berlin / Heidelberg, 2012, pp. 40-53. 

S. AlAwadhi and H. J. Scholl, "Aspirations and Realizations: The Smart City of Seattle," in 46th Hawaii 
International Conference on System Sciences (HICSS-46), Wailea, HI, USA, 2013, pp. 1695-1703. 

Anonymous, "Growth-friendly consolidation ("Wachstumsfreundliche Konsolidierung")." vol. 2013: Federal 
Ministry of Finance ("Bundesfinanzministerium"), 2013. 


172 


iConference 2014 Hans J. Scholl & Margit C. Scholl 


J. Becker, P. Bergener, and M. Rackers, "Business Process Assessment and Evaluation in Public 
Administrations using Activity Based Costing," in 15th Americas Conference on Information 
Systems (AMCIS 2009). vol. Paper 16 San Francisco, CA: AIS, 2009, pp. 1-8. 

C. Bennett, "From the Dark to the Light: The Open Government Debate in Britain," Journal of Public 
Policy, vol. 5, pp. 187-213, 1985. 

J. C. Bertot, P. T. Jaeger, and J. M. Grimes, "Using ICTs to create a culture of transparency: E-government 
and social media as openness and anti-corruption tools for societies," Government Information 
Quarterly, vol. 27, pp. 264-271, 2010. 

J. C. Bertot, P. T. Jaeger, and J. M. Grimes, "Promoting transparency and accountability through ICTs, 
social media, and collaborative e-government," Transforming Government: People, Process and 
Policy, vol. 6, pp. 78-91, 2012. 

J. C. Bertot, P. McDermott, and T. Smith, "Measurement of Open Government: Metrics and Process," in 
45th Hawaii International Conference on System Sciences (HICSS-45), Maui, HI, USA, 2012, pp. 
2491-2499. 

L. Bode, "Facebooking It to the Polls: A Study in Online Social Networking and Political Behavior," Journal 
of Information Technology & Politics, vol. 9, pp. 352-369, 2012. 

E. Bonsón, L. Torres, S. Royo, and F. Flores, "Local e-government 2.0: Social media and corporate 
transparency in municipalities," Government Information Quarterly, vol. 29, pp. 123-132, 2012. 

M. M. Brown, "The Benefits and Costs of Information Technology Innovations: An Empirical Assessment 
of a Local Government Agency," Pubic Performance & Management Review, vol. 24, pp. 351 - 366, 
2001. 

M. Bruneau, S. E. Chang, R. T. Eguchi, G. C. Lee, T. D. O'Rourke, A. M. Reinhorn, M. Shinozuka, K. 
Tierney, W. A. Wallace, and D. von Winterfeldt, "A framework to quantitatively assess and 
enhance the seismic resilience of communities," Earthquake spectra, vol. 19, pp. 733-752, 2003. 

H. E. Chandler, "Towards open government: official information on the Web," New Library World, vol. 99, 
pp. 230-237, 1998. 

R. A. Chapman and M. Hunt, Open Government in a Theoretical and Practical Context. Aldershot, Hants, 
England ; Burlington, VT: Ashgate, 2006. 

Y. Charalabidis and E. Loukis, "Participative Public Policy Making Through Multiple Social Media 
Platforms Utilization," International Journal of Electronic Government Research, vol. 8, pp. 78- 
97, 2012. 

H. Chourabi, T. Nam, S. Walker, J. R. Gil-Garcia, S. Mellouli, K. Nahon, T. A. Pardo, and H. J. Scholl, 
"Understanding smart city initiatives: An integrative and comprehensive theoretical framework," in 
45th Hawaii International Conference on System Sciences, Maui, Hawaii, 2012, pp. 2289-2297. 

B. Clinton, Back to work : why we need smart government for a strong economy, 1st ed. New York: Alfred 
A. Knopf. 

L. K. Comfort, A. Boin, and C. C. Demchak, Designing resilience : preparing for extreme events. Pittsburgh, 
Pa.: University of Pittsburgh Press, 2010. 

C. Cottarelli and A. Schaechter, "Long-Term Trends in Public Finances in the G-7 Economies," IMF, 2010, 
pp. 1-24. 

S. P. Crawford, Captive audience: the Telecom industry and monopoly power in the new gilded age: Yale 
University Press. 

S. S. Dawes, "Interagency information sharing: Expected benefits, manageable risks," Journal of Policy 
Analysis and Management, vol. 15, pp. 377-394, 1996. 

A. Deligiaouri, "Open Governance and E-Rulemaking: Online Deliberation and Policy-Making in 
Contemporary Greek Politics," Journal of Information Technology & Politics, vol. 10, pp. 104-124, 
2013. 


173 


iConference 2014 Hans J. Scholl & Margit C. Scholl 


R. Effing, J. Hillegersberg, and T. Huibers, "Social Media and Political Participation: Are Facebook, Twitter 
and YouTube Democratizing Our Political Systems?," in Electronic Participation. vol. 6847, E. 
Tambouris, A. Macintosh, and H. Bruijn, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, 
pp. 25-35. 

I. A. Elbadawi, "The State of Open Government Data in GCC Countries," in 12th European Conference 
on eGovernment (ECEG 2012), Barcelona, Spain, 2012, pp. 193-200. 

A. M. Evans and A. Campos, "Open Government Initiatives: Challenges of Citizen Participation," Journal 
of Policy Analysis and Management, vol. 32, pp. 172-185, Win 2013. 

R. Giffinger, C. Fertner, H. Kramar, R. Kalasek, N. Pichler-Milanovic, and E. Meijers, "Smart Cities: 
Ranking of European Medium-Sized Cities," Centre of Regional Science (SRF), Vienna University 
of Technology, Vienna, Austria2007. 

M. Glachant, "The need for adaptability in EU environmental policy design and implementation," European 
Environment, vol. 11, pp. 239-249, 2001. 

T. M. Harrison, S. Guerrero, G. B. Burke, M. Cook, A. Cresswell, N. Helbig, J. Hrdinova, and T. Pardo, 
"Open government and e-government: Democratic challenges from a public value perspective," 
Information Polity: The International Journal of Government & Democracy in the Information 
Age, vol. 17, pp. 83-97, 2012. 

S. Hofmann, M. Rackers, D. Beverungen, and J. Becker, "Old Blunders in New Media? How Local 
Governments Communicate with Citizens in Online Social Networks," in 46th Hawaii International 
Conference on System Sciences (HICSS-46), Wailea, HI, USA, 2013, pp. 2023-2032. 

R. G. Hollands, "Will the real smart city please stand up?," City, vol. 12, pp. 303-20, 2008. 

M. Janssen and E. Estevez, "Lean government and platform-based governance—Doing more with less," 
Government Information Quarterly, vol. 30, Supplement 1, pp. S1-S8, 2013. 

E. W. Johnston and D. L. Hansen, "Design lessons for smart governance infrastructures," American 
Governance, vol. 3, pp. 1-30, 2011. 

N. T. Khayyat, "Effects of Information Technology on Cost, Quality and Efficiency in Provision of Public 
Services," Information and Communication Techniologies Policies and Practices, pp. 73-90, 2010. 

J. Klessmann, P. Denker, I. Schieferdecker, and S. n. E. Schulz, "Open Government Data Germany: A 
study on open government in Germany commissioned by the Federal Ministry of the Interior (in 
German, "Open Government Data Deutschland: Eine Studie zu Open Government in Deutschland 
im Auftrag des Bundesministerium des Innern")." vol. 2013: Federal Ministry of the Interior 
("Bundesinnenministerium"), 2012. 

R. Klischewski and H. J. J. Scholl, "Information quality as capstone in negotiating e-government integration, 
interoperation and information sharing," Electronic Government: An International Journal, vol. 5, 
pp. 203-225, 2008. 

J. R. T. Lewis, "FOIA and the Emergence of Federal Information Policy in the 1980s and 1990s," in 
Handbook of Public Information Systems, G. D. Garson, Ed. New York: Marcel Dekker, 2000, pp. 
41-52. 

A. L. Lim, M. Masrom, and S. Din, "eGovernment and eGovernance: Concepts and Constructs," in 12th 
European Conference on eGovernment (ECEG 2012), Barcelona, Spain, 2012, pp. 844-851. 

D. Linders and S. C. Wilson, "What is open government?: one year after the directive," in 12th Annual 
International Conference on Digital Government Research (dg.o 2011), College Park, MD, USA, 
2011, pp. 262-271. 

J. v. Lucke, "Memorandum on the Opening of Government and Administration (Open Government): 
Position paper of the section Administrative Informatics and the section Informatics and Law in 
Public Administration of the (German) Society for Informatics (in German, "Memorandum zur 
Offnung von Staat und Verwaltung (Open Government): Positionspapier der Fachgruppe 


174 


iConference 2014 Hans J. Scholl & Margit C. Scholl 


Verwaltungsinformatik und des Fachbereichs Informatik in Recht und öffentlicher Verwaltung der 
Gesellschaft fiir Informatik"." vol. 2013: Gesellschaft fiir Informatik e.V., 2012. 

P. McDermott, "Building open government " Government Information Quarterly, vol. 27, pp. 401-413, 2010. 

A. J. Meijer, D. Curtin, and M. Hillebrandt, "Open government: connecting vision and voice," International 
Review of Administrative Sciences, vol. 78, pp. 10-29, Mar 2012. 

I. Mergel, "The social media innovation challenge in the public sector," Information Polity: The International 
Journal of Government €& Democracy in the Information Age, vol. 17, pp. 281-292, 2012. 

I. Mergel, "Social media adoption and resulting tactics in the U.S. federal government," Government 
Information Quarterly, vol. 30, pp. 123-130, 2013. 

T. Nam, "New Ends, New Means, but Old Attitudes: Citizens' Views on Open Government and Government 
2.0" in 44th Hawaii International Conference on System Sciences (HICSS-44), Kauai, Hawaii USA, 
2011, pp. 1-10. 

T. Nam and T. A. Pardo, "Conceptualizing smart city with dimensions of technology, people, and 
institutions," in 12th Annual International Conference on Digital Government Research (dg.o 
2011), College Park, MD, USA, 2011, pp. 282-291. 

T. Nam and T. A. Pardo, "Smart city as urban innovation: focusing on management, policy, and context," 
in 5th International Conference on Theory and Practice of Electronic Governance (ICEGOV 2011), 
Tallinn, Estonia, 2011, pp. 185-194. 

F. Norris, S. Stevens, B. Pfefferbaum, K. Wyche, and R. Pfefferbaum, "Community Resilience as a 
Metaphor, Theory, Set of Capacities, and Strategy for Disaster Readiness," American Journal of 
Community Psychology, vol. 41, pp. 127-150, 2008. 

. Nunberg, Re-thinking civil service reform: An agenda for smart government: World Bank, 1997. 


UW 


. R. Orszag, "Open Government Directive: Memorandum for the heads of executive departments and 
agencies,", Executive Office of the President, Office of Management and Budget, Ed. Washington, 
DC: The White House, 2009, pp. 1-11. 

A. Reinhardt and L. Steiner, "E-Energy German Smart Grid Projects Overview." vol. 2013: Federal 

Ministry of Economics and Technology (Bundesministerium fiir Wirtschaft und Technologie), 2010. 


(=r 


. Rifkin, The third industrial revolution : how lateral power is transforming energy, the economy, and the 
world. New York: Palgrave Macmillan, 2011. 

A. S. Roberts, "Less Government, More Secrecy: Reinvention and the Weakening of Freedom of Information 
Law," Public Administration Review, vol. 60, pp. 308-320, 2000. 

C. Scartascini, E. Stein, and M. Tommasi, "How Do Political Institutions Work? Veto Players, 
Intertemporal Interactions, and Policy Adaptability," Washington, DC, United States: Inter- 
American Development Bank. Mimeographed document, 2008. 

H. J. Scholl, "E-government-induced business process change (BPC): An empirical study of current 
practices," International Journal of Electronic Government Research, vol. 1, pp. 25-47, Jan-Mar 
2005. 

H. J. Scholl, "Five trends that matter: Challenges to 21st century electronic government," Information 
Polity, vol. 17, pp. 317-327, 2012. 

H. J. Scholl and R. Klischewski, "E-Government Integration and Interoperability: Framing the Research 
Agenda," International Journal of Public Administration, vol. 30, pp. 889-920, 2007. 

H. J. Scholl, H. Kubicek, R. Cimander, and R. Klischewski, "Process integration, information sharing, and 
system interoperation in government: A comparative case analysis," Government Information 
Quarterly, vol. 29, pp. 313-323, 2012. 

H. J. Scholl and L. F. Luna-Reyes, "Transparency and openness in government: a system dynamics 

perspective," in Proceedings of the 5th International Conference on Theory and Practice of 

Electronic Governance, Tallinn, Estonia, 2011, pp. 107-114. 


175 


iConference 2014 Hans J. Scholl & Margit C. Scholl 


H. J. Scholl and L. F. Luna-Reyes, "Uncovering Dynamics of Open Government, Transparency, 
Participation, and Collaboration," in Proceedings of the 44th Annual Hawaii International 
Conference on System Sciences (HICSS 2011), Kauai, Hawaii USA, 2011, pp. 1-11. 

S. Shapiro, "The Paperwork Reduction Act: Benefits, costs and directions for reform," Government 
Information Quarterly, vol. 30, pp. 204-210, 2013. 

P. Sobkowicz, M. Kaschesky, and G. Bouchard, "Opinion mining in social media: Modeling, simulating, and 
forecasting political opinions in the web," Government Information Quarterly, vol. 29, pp. 470-479, 
2012. 

R. Sudan, "Towards SMART government: the Andhra Pradesh experience," Indian Journal of Public 
Administration, vol. 46, pp. 401-410, 2000. 

G. Tullock, "Government Spending." vol. 2012: Library of Economics and Liberty, 2002. 

H. Wallace, "Policy-Making in the European Union (New European Union Series)," 2000. 

S. F. Wamba and L. Carter, "Twitter Adoption and Use by SMEs: An Empirical Study," in 46th Hawaii 
International Conference on System Sciences (HICSS-46), Wailea, HI, USA, 2013, pp. 2042-2049. 

H. Willke, Smart governance: Governing the global knowledge society: Campus Verlag Gmbh, 2007. 


6 Table of Figures 


Figure 1: Problem Outcome Matrix — Type A and Type B Outcomes are understudied.............::::eeee 171 
Figure 2: The Trajectory from E-Government Research to Smart and Open Government Research....... 172 


7 Table of Tables 


Table 1: Areas of Focus (columns) and Elements of Smart Governance (rOWS)......::::cceeseeeseeeteeeeeeeteeeneees 176 
8 Appendix 

Budgeting/ Electronic Security and Infrastructure Electric Mobility Participation Open Data / Big 

controlling/ government/ Safety Overhaul and and Data Provision 

evaluating administrative Ubiquitous High- Collaboration and Use 

modernization/ speed 
process Connectivity 
streamlining 
Norms 
=r o 
Policies edin petal 
> pe Address 
Practices P eed ło 
erna! 
Information ot smart Gov 
ne Element® 
ICTs and Other us Area 
Foc! 

Technologies per 
Skills and 


Human Capital 
Other resources 


Table 1: Areas of Focus (columns) and Elements of Smart Governance (rows) 


176 


A Crowdsourcing Approach for Finding Misidentifications of Bibliographic Records 


Atsuyuki Morishima', Tomita Shiori’, Takanori Kawashima’, Takashi Harada’, Norihiko 
Uda!, Sho Sato? and Yukihiko Abematsu! 

1 University of Tsukuba 

? National Diet Library 

3 Doshisha Univeristy 


Abstract 

Because there is no perfect technique for automatic identification of bibliographic records, cleaning the 
identification results manually is indispensable. However, to recruit human resources for the task is often 
difficult. This paper discusses a microtask-based crowdsourcing approach to the problem. An important 
issue is to design a good strategy for generating tasks to be assigned to workers, maintaining the quality 
and reducing the number of tasks. In this study, we explore a design space defined by two criteria to 
reduce the number of assigned microtasks for finding misidentifications caused by automatic identification 
techniques. We compare four task-generation strategies using bibliographic records of the National Diet 
Library. One of the strategies reduced 55.7% of tasks from the baseline strategy and statistic analysis 
showed that the quality of its result is comparable to those of the other three strategies. 
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1 Introduction 


Recently, there have been many attempts to construct union catalogs by collecting and joining bibliographic 
records originally managed by different organizations. However, it is often the case that one source (e.g., a 
book) is represented by more than one bibliographic record. Therefore, it is common practice to conduct 
automatic identification of bibliographic records in which algorithms identify the records that represent the 
same source. For example, the National Diet Library (NDL) in Japan conducts automatic identification for 
constructing its union catalog. 

A common approach to automatic identification is to use algorithms for computing the 
identification key for each record, and we determine that records having the same key represent the same 
source. In many cases, algorithms of automatic identifications use ISBNs to compute identification keys, 
because they are the only numbers that exist in virtually all bibliographic records. 

However, it is well known that there is no perfect technique for automatic identification of 
bibliographic records. There are several reasons for this. First, the contents of different records representing 
the same source are not necessarily the same if their creators are different; attributes such as title may have 
spelling variants. It may miss some words or may have values that should be entered for other attributes. 
For example, Figure 1 shows two apparently different records that represent the same source. The strings 
in the title attributes are slightly different from each other, and the string in the series attribute of the 
second record is an abbreviation. NDL also has many record pairs in which both represent the same source 
but one is written in Japanese while the other is in English. Note that the machine can determine that two 
records represent the same source only if both the identification keys and other information in the records 
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are exactly the same. If not, the machine can conclude that two sources (books) are neither different nor 
the same, because it is difficult for the machine to determine whether different words and sentences in two 
records are semantically equivalent to each other. 

Second, the values used to compute identification keys are often inappropriately assigned to the 
sources. For example, a recent research by NDL found that it is not rare for different sources to have the 
same ISBN (Publishing a detailed report on the topic is one of our future work). A reason is that publishers 
often reuse the same ISBN for different publications. Another reason is that bibliographic records tend not 
to be updated after an incorrect ISBN is given to the record. Keys derived from such inappropriate numbers 
lead to misidentifications by algorithms of automatic identification, i.e., different sources are incorrectly 
determined to be the same source. 

Therefore, it is necessary to manually verify the automatically identified records for final 
identification. Humans are good at determining whether the difference in two records implies a 
misidentification, because they can tell whether two different expressions are semantically equivalent. 
However, the number of records that require manual verification can be large, and it is often difficult to 
recruit human resources for the task. 


Title Series Publisher 
Towards the e-society : e-commerce, e-business, and e-government : The 

the first IFIP Conference on E-Commerce, E-Business, E- International 

Government (13E 2001), October 3-5, 2001, Zurich, Switzerland / Federation for Kluwer 


edited by Beat Schmid, Katarina Stanoevska-Slabeva, Volker Information Academic 
Tschammer Processing ; 74 Publishers 
Towards the e-society: e-commerce, e-business, and e-government : 

the first IFIP conference on e-commerce, e-business, e-government Kluwer 
(13E 2001) October 3-5, 2001, Zurich, Switzerland. : Oct 2001, Academic 
Zurich, Switzerland IFIP ; 74 Publishers 


Figure 1: Bibliograhic records having the same ISBN 


This paper discusses a crowdsourcing approach to the problem that is taken by L-Crowd project 
(http://crowd4u.org/lcrowd). To our knowledge, L-Crowd is one of the largest crowdsourcing projects for 
library-related problems; the project involves core members and collaborators from more than 16 universities 
and the NDL of Japan. 

In the project, we crowdsource performing microtasks designed for solving library-related problems. 
Here microtasks are tasks that can be performed in a short period of time. For example, Figure 2 is a 
microtask that asks a human whether the two faces in the photographs are the same person. 

Microtask-based crowdsourcing is a popular form of crowdsourcing, with many microtask-based 
crowdsourcing platforms such as Amazon’s Mechanical Turk. In general, a microtask-based crowdsourcing 
platform has a task pool into which requesters register microtasks that will be assigned to workers. L-Crowd 
uses a microtask-based crowdsourcing platform named Crowd4U (http://crowd4u.org) (Morishima, 
Shinagawa, Mitsuishi, Aoki, & Fukusumi, 2012). Crowd4U is deployed at universities, and many anonymous 
or registered volunteers perform the microtasks registered in Crowd4U’s task pool. 

The contributions of the paper are as follows: 


(1) Crowdsourcing for identification of bibliographic records. To our knowledge, this paper is the 
first to discuss microtask-based crowdsourcing in the identification of bibliographic records. There are 
attempts to use crowdsourcing to solve library-related problems. For example, in the “Civil War Faces” 
project, the Library of Congress crowdsources the tagging of photos through Flickr 
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(http: //www.flickr.com/photos/library_of_congress/sets/72157625520211184/). Australian National 
Library crowdsources proofreading the results of applying OCR to old newspapers 
(http: //trove.nla.gov.au/ndp/del/home). Bodleian Library of University of Oxford crowdsources enhancing 
metadata of music scores (http://whatsthescoreatthebodleian.wordpress.com/). Our approach is unique in 
the following two ways. First, we use human power to find semantic equivalence between different 
expressions, while others use it to recognize patterns in images. Second, we discuss the strategies for 
generating tasks in a scientific way, while we have not noticed any other projects that gave their scientific 
justifications for their design decisions. This paper is the first to present a formal framework for the 
application of crowdsourcing to the problem of finding misidentifications. 


(2) Novel technique for generating microtasks. When we apply microtask-based crowdsourcing, what 
constitutes an appropriate design of microtasks and how to generate them is an important issue. The design 
and generation strategy of microtasks affects both the number of necessary microtasks to reach the goal 
and the quality of the output data. This paper explores a design space of microtask-based crowdsourcing 
that is defined by two criteria for finding misidentifications of bibliographic records. One of the two criteria, 
called the contraction, is a novel technique we proposed (Tomita, Morishima, Uda, & Harada, 2013) for 
extending the design space. In general, contraction is a technique in graph theory to merge different nodes 
into one in a given graph (Wilson, 2010), and is often used to reduce the size of the graph without losing 
some important properties of the graph. We show that the technique is effective in reducing the number of 
microtasks in the problem of finding misidentifications, keeping data quality acceptable. 


(3) Experiments with NDL data. We conducted an experiment using a real set of NDL bibliographic 
records to compare four variations of microtask generation strategies within the design space. As explained, 
the NDL conducted automatic identification for constructing its union catalog and suffered from the 
problem of misidentifications. Our experimental results suggest that crowdsourcing is promising in a real 
setting. In addition, we plan to use the proposed scheme to obtain data to improve the query results of 
bibliographic search in other libraries in Japan. 


The remainder of this paper is as follows. Section 2 explains related work. Section 3 formalizes our problem 
as a human-powered join. Section 4 presents four strategies for generating microtasks in a design space 
defined by two independent criteria. Section 5 shows the results of our experiments to compare the four 
task-generation strategies, and Section 6 is the summary. 


2 Related Work 


Our problem is related to various topics from different domains. This section enumerates some of them. 


Automatic Identification of Bibliographic Records. Automatic identification techniques compute 
identification keys from bibliographic records and conclude that two records represent the same source if 
they have the same keys. For example, a study attempted to develop appropriate identification keys for 
Unicanet, a well-known union catalog network in Japan 
(http://unicanet.ndl.go.jp/psrch/redirect.jsp?type=psrch). Another example is the NDL search 
(http://iss.ndl.go.jp/), which implements automatic identification techniques using several variations of 
identification key. Both work utilize ISBN as part of the identification keys. However, publishers do not 
necessarily assign ISBNs appropriately. We plan to apply our technique to find misidentifications due to 
inappropriate assignment of ISBNs and analyze the obtained results for improving the quality of 
identification keys. 
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Efficient Processing of Human-powered Operations. Strategies for generating microtasks are an 
important factor for successful crowdsourcing. For example, in some cases, workers tend to avoid performing 
a complex task even if they are paid more than the sum of the price for each component of the original task 
(Kittur, Smus, Khamkar, & Kraut, 2011). In essence, our problem can be modeled as a human-powered 
join (Marcus, Wu, Karger, Madden, & Miller, 2011), an operation to determine whether a pair of data items 
satisfy a given condition. In our setting, the condition is that two bibliographic records with the same ISBN 
represent different sources. Efficient processing of human-powered joins is discussed in existing papers 
(Marcus, Wu, Karger, Madden, & Miller, 2011; Mitsuishi, Morishima, Shinagawa, & Aoki, 2013). Proposed 
techniques include (1) changing the size of tasks; (2) changing the presentation of microtasks; and (3) using 
the power of crowd to reduce the number of tasks. 

Recently, Wang et al. independently proposed to consider transitive relations to reduce the number 
of required tasks in human-powered joins (Wang, Li, Kraska, Franklin, & Feng, 2013). The proposed 
technique is related to our contraction technique in the sense that they use the results for some pairs to 
infer the results of other pairs. One of the contributions of our paper is that it shows our design space for 
task-generation strategies that incorporates this family of optimization techniques is effective in finding 
misidentifications of bibliographic records, a real problem in the library domain. 


Data Quality in Crowdsourcing. Another important issue in the data-centric crowdsourcing is data 
quality. Since many techniques for improving data quality are independent of the task design, our proposed 
technique can be combined with many existing techniques. For example, many crowdsourcing adopt 
majority voting (Marcus, Wu, Karger, Madden, & Miller, 2011), a technique that relies on the law-of-large- 
numbers. In Section 5, we show that one of our task-generation strategies significantly reduces the number 
of tasks and the quality of outputs is comparable to the others when we adopt majority voting. Another 
approach is a coordination game (Morishima, Shinagawa, & Mochizuki, 2011), in which rational workers 
give appropriate values. Jain and Parkes (2008) provides a game-theoretic analysis of games with a purpose 
for obtaining data, and shows that a simple change of the incentive structure can affect the obtained data. 


3 Identification of Bibliographic Records 


In this section, we first briefly explain the problem of identification of bibliographic records. Then, we 
formalize our problem as a human-powered join. 


3.1 Identification of Bibliographic Records 


Given two bibliographic records b: and bj, identification of bibliographic records is to determine whether b; 
and b; represent the same source. 

This can be done manually by experts, or automatically by machines (algorithms) if bibliographic 
records are machine-readable. The latter is called automatic identification of bibliographic records. A general 
approach for automatic identification is to compute identification keys using data encoded in bibliographic 
records and then to determine that two bibliographic records refer to the same source if they have the same 
key. Typically, identification keys are computed using ISBNs (ISO 2108:2005) and MARC numbers (ISO 
2709:2008), which are assigned by publishers and MARC management institutes, respectively, for 
distinguishing publications. However, at present, ISBNs are the only numbers that exist in virtually all 
bibliographic records. Other numbers, such as MARC numbers, are not in common use compared to ISBNs. 
Therefore, they are used as a supplementary means in automatic identification techniques. 

The problem is that the results of automatic identification techniques are not necessarily correct, 
because it is impossible to construct a perfect identification key (Taniguchi, 2009). There are several reasons 
for this. First, because the bibliographic records are created manually, the created records often have minor 


variations or are just incorrect. Second, even if the information written in the records is correct, ISBNs, 
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which in many cases are used to construct identification keys, are often inappropriately assigned by 
publishers to sources so that they cannot work as perfect identifiers. 


3.2 Formalization 


Given that the results of automatic identification of bibliographic records are not necessarily correct, and 
that we need to use human power to solve the problem, we apply a crowdsourcing approach to the problem 
of finding misidentifications. In this section, we formalize our problem as a human-powered join (Marcus, 
Wu, Karger, Madden, & Miller, 2011). A human-powered join is a special form of the join operation of 
database relations (tables). Logically, the join operation first enumerates every pair of tuples (rows) of two 
relations and then selects from the candidate pairs those pairs that satisfy the given condition, which will 
be included in the results. In a human-powered join, humans determine whether each pair of data items 
satisfies the given condition. For each pair of data items, a task is invoked to ask a worker whether a pair 
of data items satisfies a condition. For example, assume that we have a relation of photos of human faces 
and want to self-join the relation to find pairs that has photos of the same person. Then, Figure 2 is an 
example of a microtask that asks a worker if the persons in two photos are the same. 

Note that the model is compatible with the formalization of the record linkage problem presented 
in (Gu, Baxter, Vickers, & Rainsford, 2003), in which the problem is modeled as computing the Cartesian 
product of sets of records and then checking whether two records in each pair match with each other. 
Modeling our problem as a human-powered join generalizes this formalization so that the process of checking 
whether the condition holds is partly crowdsourced. 

Bibliographic Records. We use a relation with a relational schema B(tid,record) to store a set of 
bibliographic records (Figure 3). Here, tid is a tuple identifier and record is a relational attribute to store 
each bibliographic record. Note that tid is not an identifier of sources represented by records, but that of 
tuples in the relation. 

Automatic Identification. Each technique for automatic identification of bibliographic records using 
identification keys can be modeled as a self-join of Relation B as follows: 


$ 
B X key(record)=key(record')atid<tid' B 


Here B' is an alias of Relation B to distinguish two relations in the self-join. key(v) is a function defined 
by each automatic identification technique to compute identification keys. As mentioned, each identification 
key is computed using values contained in record in automatic identification techniques. We need tid < 
tid’ to avoid re-evaluating the same record pair, i.e., to avoid evaluating the pair (tide,tid:) if the pair 
(tidi,tid2) has already been evaluated. 


Are they the same person? 


rar 


Figure 2: Example of a task for a human-powered join (photos are taken from JAFFE Database (Lyons, 
Akamatsu, Kamachi, & Gyoba, 1998)) 
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tid record 

1 (Towards the e-society:..., ..., X publisher) 
2 (The XML book, ... , ..., Y company) 

3 (Towards e-society:, ..., ..., X publishing) 


Figure 3: Example of a relation B(tid, record) 


Finding misidentification by human-powered joins. 


Then, the process of finding misidentification can be written as a human-powered join as follows. 
ià 
B ey (record)=key(record' )atid<tid' adif f (record,record' ) B 


Here diff(record,record') is the condition the workers examine to see if the pair of records satisfies. The 
condition holds if record and record’ represent different sources (books). Therefore, the result of the human- 
powered join will contain the set of record pairs, each of which has two records with the same key, but 
considered by workers to represent different sources. 


Definition of the same source. Assume that bı and bə have the same identification key. We define that 
bı and bə represent the same source if they are equivalent at the expression level defined by FRBR (IFLA 
Study Group on the Functional Requirements for Bibliographic Records, 1998). In other words, diff(b:,b2) 
holds if bı and bz represent sources that are different to each other at the expression level. The reason we 
chose the expression-level equivalence is that ISBNs and MARC numbers are assumed to be unique at this 
level. 


4 Strategies for Generating Microtasks 


In our approach, we need to generate microtasks for asking workers whether diff(record,record') holds for 
given record and record’. In this section, we introduce two criteria to define the space for designing 
strategies, and explain four strategies within it. 

To discuss the strategies, we introduce groups of bibliographic records. A group Gy, is a set of 
bibliographic records with the identification key k and is defined as follows: 


Gg = {record |B (tid, record), key(record) = k}. 
For a given Gx, each task strategy generates a set of microtasks to obtain the set {(r,r' )|r,r' E€ Gi,diff(r,r')}. 


Therefore, we can discuss the generation of microtasks for a particular given group without losing generality. 
In the following discussions, we assume that k is an integer and that we have Gi = {r1,r2,r3,r4} 


cd ©. (i 


Figure 4: Contraction 


4.1 Task Template and the Design Space 


Our design space is defined by a task template for microtasks and two criteria to affect the number of 
generated tasks. We first explain the task template and then explain two criteria, namely, simultaneous 
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comparison and contraction. A task generation strategy is good if it requires a smaller number of tasks but 
the quality of the results remain comparable to, or better than, the others. 


Task template. We use BTask(r,R) to denote a task template to generate microtasks for finding 
misidentifications. Here r is a bibliographic record and R is a set of records to be compared to r. Tasks are 
instantiated by the task template with parameters. For example, let ri,r2, and r3 be bibliographic records. 
Then, BTask(r,{r2,r3,r4}) is a task for asking workers whether any of re, rs, and r4 are different from rı. 
Figure 8 is a screenshot of BTask(r1,{12,r3,r4}). 


First criterion: size of the task. The first criterion is the size of R. For pairwise comparison, we set |R| 
= 1. For example, we use BTask(ri,{r2}) to discover whether diff(ri,rz) holds. On the other hand, tasks for 
simultaneous comparison can be implemented by setting |R| > 1. For example, from the result for 
BTask(ri,{r2,r3}), we can obtain information on both diff(ri,r2) and diff(r1,r3). 


Second criterion: contraction. The second criterion is whether we apply contraction to the bibliographic 
records in the process of the human-powered join. The contraction is a technique in graph theory to merge 
different nodes into one in a given graph, and the edges connected to the deleted node will be inherited by 
the merged node. In our context, we use the technique to merge two bibliographic records into one if we 
know the two records represent the same source. This is done to reduce the number of comparisons without 
changing the result of the human-powered join. For example, assume that we have a group that has four 
bibliographic records and requires six comparisons (Figure 4). If we could merge record b to record a, the 
number of comparisons would be reduced to three, and the comparison results for the removed edges would 
be derived from the results of comparing a to c and d. 

We developed four strategies for generating microtasks in the design space. First, the simplest 
strategy, Al, uses pairwise comparison and does not adopt contraction. Second, B1 uses the simultaneous 
comparison and does not adopt contraction. Third, A2 performs pairwise comparisons with the contraction 
technique. Finally, B2 performs simultaneous comparisons with the contraction technique. 


4.2 A1: The Simplest Strategy 


A1 is the simplest strategy that uses the pairwise comparison and does not adopt contraction. Figure 6 
shows an example of an Al task. The generation process is as follows. First, for each Gr, Al constructs a 
set Pk of pairs such that Pr = B — Mey(recora)—key(record')=kAtidctia'B'. For Gi, P = 
{(ri,r2), (11,73), (11,24), (12,13) ,(r2,r4),(13,r4) }. In general, |px| = |\cxjC2|. Note that we can use Px as an intermediate 
result to compute our human-powered join defined in Section 3.2, i.e., the join result can be computed by 
Oaist(rix))(Px). In other words, computing Px is part of the process of our human-powered join. 

Then, the second step is to generate microtasks to ask workers which pairs in Px satisfy diff(r;,r;). 
A1 generates BTask(ri,{rj}) for each pair (ri,r;) in Px. In our example, we obtain six Al tasks because |Px| 
= 6. 

Figure 5 shows the algorithm used to generate Al tasks, where a task is generated (Line 4) for each 
pair in P, (Line 3). 
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Set of BTasks generateTasks(P_k) { 
tasks = {}; 
for each (ri, rj) in Pk { 
tasks = tasks + {Blask(r_i, {r_Jj})}; 
} 
return tasks; 


} 
Figure 5: Algorithm to generate Al tasks 


YWUHDOBPWNE 


Generate a task for 
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we > o 


OO”) ma om 
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Figure 6: Generating an Al task 


4.3 B1: Simultaneous Comparison 


B1 allows the simultaneous comparison but does not adopt contraction. B1 uses tasks represented by 
BTask(r,R) where R 2 1. For example, Figure 8 is BTask(ri,{r2,r3,r1}). B1 takes a pre-fixed parameter m 
that represents the maximum size of R. If we generate B1 tasks with m = 3 for the set Gi of bibliographic 
records, we get three B1 tasks BTask(ri,{r2,r3,r4}), BTask(re,{r3,r4}), and BTask(r3,{r1}). Note that we can 
obtain from the results of the three tasks whether diff(r;,r;) holds for every pair in P4. 

Figure 7 is the algorithm to generate B1 tasks given P; and m. In short, we enumerate every pair 
(ri,Ri) such that (1) ri is a record that appears on the left side of a pair in Px (Line 3) and (2) Ri is a set of 
m records each rj of which appears as (r;,r;) in Px (Lines 4-7). For each such a pair, we generate BTask(ri,R) 
(Line 7). 

To obtain the m records, the algorithm utilizes a stack to store rj that is paired with pi (Lines 5, 
7). In Lines 10-11, it generates a task when we have n (0 < n < m) records remaining in the stack for ri. 
The algorithm does not generate tasks to produce duplicate results because Px contains no (rj,ri) if it contains 


(rir). 


1. Set of BTasks generateTasks(P_k, m) { 

2 tasks = {}; 

3 for each r_i in P_k.leftRecords { 

4 for each (r i; tJ) in Pk f{ 

Ia stack.push(r_j); 

6 if(stack.length == m) { 

7 tasks = tasks + {BTask(r_i, {stack.allpop}) }; 
8. } 

9. } 
10. if(stack.length > 0) 
Tiy tasks = tasks + {BTask(r_i, {stack.allpop})}; 
12). } 
13. return tasks; 
14. } 


Figure 7: Algorithm to generate B1/B2 tasks 
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Generate a task for m 
each combination of Choose any books that sre 
three pairs Gearty different trom the 
above 


Figure 8: Example of a B1 task 


4.4 A2 and B2: Introducing Contraction 


A2 (B2) is the same as Al (B1) except that it adopts the contraction technique for reducing the number of 
tasks. If a worker determines that diff(r;,r;) does not hold for the given pair (r:,rj), we consider r; and rj are 
equivalent in the sense that the results of tasks involving ri are the same as those involving rj. Therefore, 
we can omit the tasks involving r; and reduce the number of tasks. 

The A2 tasks are generated in the following way. First, we construct the set Px of pairs in the same 
way as for Al tasks. As with Al, the remaining step is to generate microtasks to discover whether each pair 
in Px satisfies diff(ri,rj). During the process, we use the result of already performed tasks to remove pair 
(£ur) in Px if we know whether diff(r,,r) holds from other results. Finally, we stop the process if there is no 
pair (rrj) in Px for which we do not know whether diff(r;,rj) holds. 

Figure 9 shows the algorithm to generate tasks in A2. It generates a task for each pair in Px (Lines 
3-4), but if the result of a performed task suggests that the two records represent the same source, (1) the 
algorithm computes another pair p that would produce duplicate results (Line 9), (2) keeps p and the 
original pair in set equiv (Line 10), and (3) removes p from Px (Line 11). Finally, equiv is used to 
produce the results for the removed pairs (Lines 16-18). 

We can obtain the algorithm for B2 by slightly changing the algorithm in Figure 9. First, we call 
the algorithm shown in Figure 7 to produce the initial set of microtasks. Second, we insert a call for the 
same algorithm between Lines 11 and 12 to re-generate the set of microtasks. Finally, in Line 4, we register 
the generated tasks (instead of BTask(ri,{rj})) to the task pool. We omit the detail because it is 
straightforward. 
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Figure 9: Algorithm of A2 


5 Experiment 


1. result={} // stores the result set of pairs. 

2. equiv={} // a set of pairs whose results are the same. 
3. for each (r i; rj) in P_k { 

A; TaskPool.register(BTask(r i, {r_j}). 

Dis Let isdiff be the result of BTask(r i, {r_j}). 

6. if (isDiff) result.add((r_i, r_j)); 

7. else { // two records represent the same source 

8. for each r g sete r_q.tid > r i tid 

9. if (p in P k s.t. p=(r_q, r i) or (r i; r_q)) { 
10. equiv.add((p, (r_q, r j))); 
11. p_k.remove (p); 
12s } 
Tos } 
14. } 
15. J 
16. for each (p, in equiv { 
17. if(p’ in result) result.add(p); 
18. } 


This section explains the results of our experiment. The purpose of the experiment is twofold. First, we 


want to know whether the microtask-based crowdsourcing approach is applicable to the problem of finding 


misidentification of bibliographic records in a real setting. Second, we want to understand how changing 


task-generation strategies affects the process and results. In the experiment, we compared four task- 


generation strategies by the number of tasks, the elapsed times for performing tasks, and the quality of data 


in terms of precision and recall. Overall, we concluded that crowdsourcing is applicable to our problem, and 


B2 significantly reduced the number of tasks while keeping the quality of its result comparable to that of 


the other strategies. 


5.1 Settings 
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Figure 10: Screenshot of a task on Crowd4U 


Crowdsourcing Platform. We used Crowd4U (http://crowd4u.org), a microtask-based crowdsourcing 


platform for academic purposes. Crowd4U is deployed at many universities in Japan, with anonymous and 


registered workers performing microtasks registered in its task pool. Figure 10 shows a screenshot of a 


Crowd4U microtask. 
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Data. In the experiment, we used bibliographic records of the unified catalog of the NDL. Prior to our 
experiment, the NDL applied automatic identification using ISBNs as identification keys to their records. 
Then, they selected the records whose ISBNs are not unique. 

Given the set of records, we selected bibliographic records for a source written in Japanese. In the 
experiment, the native language of all workers was Japanese and we did not want to introduce another 
factor to compare different microtask designs. As a result, we obtained 34,254 records with 11,509 different 
ISBN groups. Table 1 shows the distribution of sizes of the groups. 

Then, we conducted random sampling to extract 3% from the original groups, considering the 
distribution of the sizes of the ISBN groups. As a result, we obtain 341 groups with 933 records. We 
computed P, for each G:(1 < k < 341) and obtained Yicicsui [Px] = 1,315. 


|G| Number of groups Percentage (%) 
2 8,479 73.67 

3 1,129 9.81 

4 546 4.74 

5 388 3.37 

6 260 2.26 

7 162 1.41 

8 136 1.18 

9 83 0.72 

10 103 0.89 
11 47 0.41 
12 39 0.34 
13 24 0.21 
14 13 0.11 
215 100 0.8 
Total 11,509 100 


Table 1: Distribution of sizes of groups 


5.2 Method 


We first generated the tasks for Al and B1 for the original data set to compare the number of generated 
tasks. Since our purpose was not to find the best parameters, we set m to three in this experiment. Finding 
the best parameters is an important future study. Then, we constructed sets of tasks for Al and B1 for the 
groups with sizes more than two, because both Al and B1 generate the same set of tasks if the size of the 
group is two. We carefully examined the records and found that the set contained several inappropriate 
records, which had set-ISBNs. Set-ISBNs are ISBNs assigned not to books but to the set of books, and not 
intended to be used to identify individual books. We removed 27 pairs with set-ISBNs from UP;. Then, we 
manually created the answer set resultas of the human-powered join, i.e., the set of record pairs representing 
= 737 and Yiscsu|Px = 297. In 
Section 5.3, we use the former set of pairs to evaluate the elapsed time, while we use the latter set to 


misidentifications. As a result, we got Misi<su|Px| = 1,034, |resultans resultans 


evaluate the quality of results. Then, we inserted the generated tasks into the Crowd4U task pool for 
crowdsourcing. Finally, we used the results of Al and B1 tasks to compute the results of tasks for A2 and 
B2, by removing the tasks to be removed by A2 and B2 algorithms. Note that this is possible because each 
task is independently performed on the crowdsourcing platform, and removing some tasks does not affect 
the results of others. 
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5.3. Results and Discussions 


This section compares the four variations in terms of the number of generated tasks, elapsed time for 
performing tasks, and quality of results. 


Number of tasks. Figure 11 compares the number of generated tasks. As expected, both simultaneous 


comparison and contraction are effective in reducing the number of tasks. Compared with A1 tasks, A2, 
B1, and B2 tasks were reduced by 27.5%, 43.7%, and 55.7%, respectively. 
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Figure 11: Number of generated tasks 


Elapsed time. Figure 12 shows the elapsed times required for performing the tasks. The figure shows the 
averages, medians, and modes of elapsed times for performing A1 and B1 tasks (Times for A2 and B2 are 
omitted because they are the same as those for A1 and B1). Here we did not include the results of two tasks 
that we failed to log the elapsed times for. Note that in Figure 12, there are large differences between the 
average times and the medians. This suggests that there are outliers. A most likely reason is that workers 
often performed other jobs while performing a task. Therefore, we applied the technique proposed by Tukey 
(1977) to remove outliers. The technique uses the box-and-whisker plot and the interquartile range (IQR) 
as a parameter, which denotes the difference between the first and third quartiles. We used 1.5IQR to detect 
outliers. Al and B1 (excluding outliers) in Figure 12 show the times after removal. 

Overall, the figure suggests that the elapsed time for Al tasks is shorter than that for B1 tasks. 
This is because B1 tasks require workers to perform simultaneous comparisons that are more difficult than 
pairwise comparisons. However, the elapsed times for both types of tasks are significantly below 10s, and 
short enough for microtask-based crowdsourcing. 


(sec) 
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Figure 12: Elapsed times for performing tasks 


Quality of the results. Finally, we compared the quality of the results. Table 2 shows the results of 
human-powered joins with the four task-generation strategies. Here result. is a set of record pairs determined 
by workers as misidentifications with the task-generation strategy s. Then, we computed recall, precision, 
and F-measures using resultan. 
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Figure 13 shows recalls and precisions obtained by the human-powered joins with the four task- 
generation strategies. The results show that all recalls and precisions are more than 0.85 and 0.9, 
respectively. Table 3 shows F-measures, all of which are above 0.9. Figure 14 compares the number of 
required tasks and F-measures, where each point corresponds to a task-generation strategy. 


Strategy s |result.| ©.|P,|-|result.| 


Al 674 360 
A2 692 342 
Bl 660 374 
B2 680 354 
Answer Set 737 297 


Table 2: Number of tuples in the result 


Figure 13: Recalls and precisions 


Al Bl A2 B2 
F-measure 0.933 0.934 0.931 0.902 


Table 3: F-measures 
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Figure 14: F measures and #tasks 
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The result shows that if we directly used the result, the quality of the result of B2 tasks would be lower 
than those of others. More precisely, tests for differences among the proportions show that the differences 
between recalls of A2 and B2 are significant at the 5% significance level. For precisions, the differences for 
pairs (B1, A2), (A2, B2), (B1, B2) and (A1, B2) are significant at the 5% significance level. Note that the 
result does not suggest a negative correlation between the number of tasks and data quality. In our 
experiment, the result of B2 (A2) tasks are derived from those of B1 (A1) tasks. When we adopt the 
contraction technique, the quality of the result is heavily affected by the result of microtasks performed in 
the earlier stage. Therefore, the results imply that the quality of the result of the earlier B1 microtasks was 
lower than that of Al tasks. We can conclude that it is important to guarantee the quality of task results 
in the early stage of the process. 

In reality, however, it is rare to directly adopt the result of performing a task by one worker. It is 
rather typical to integrate results by majority voting. Then, the differences in qualities become much 
smaller. For example, if we assume that the probability is uniform and each task is performed by three 
workers, we can expect that F-measures of Al, A2, B1, and B2 are between 0.9996 and 0.9991. B2 
outperforms the others in the sense that it significantly reduces the number of tasks and its output is 
comparable to that of the other strategies. Our statistical analysis showed no significant difference in 
precision and recall if we use majority voting by five and seven workers, respectively. 


6 Summary 


This paper applied microtask-based crowdsourcing for finding misidentification in the results of automatic 
identification of bibliographic records. We modeled the problem as a human-powered join and considered 
four variations of task-generation strategies in a design space defined by two criteria. The first is the number 
of records to be compared at once, and the second is whether we apply contraction, a novel technique to 
optimize human-powered joins. We compared four task-generation strategies using bibliographic records of 
the NDL. The experimental result showed that one of the strategies reduced 55.7% of tasks from the baseline 
strategy and statistic analysis showed that the quality of its result was comparable to that of the other 
three strategies. 

Future studies include the detailed analysis of the design space with different parameter settings. 
It also includes the development of a method to incorporate techniques to improve data quality, and the 
development of a method to combine the power of experts and crowdsourcing, where disputed results are 
passed to experts for detailed verification. 
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Abstract 

As health care information proliferates on the web, the content quality is varied and difficult to assess, 
partially due to the large volume and the dynamicity. This paper reports an automated approach in 
which the quality of depression treatment web pages is assessed according to evidence-based depression 
treatment guidelines. A supervised machine learning technique, specifically Naive Bayes classification, is 
used to identify the sentences that are consistent with the guidelines. The quality score of a depression 
treatment web page is the number of unique evidence-based guidelines covered in this page. Significant 
Pearson correlation (p<.001) was found between the quality rating results by the machine learning 
approach and the results by human raters on 31 depression treatment web pages in this case study. The 
semantic-based, machine learning quality rating method is promising and it may lead to an efficient and 
effective quality assessment mechanism for health care information on the Web. 
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1 Introduction 


The last decade has witnessed a dramatic expansion in the amount of publicly available health care 
information on the World Wide Web, and the use of web health care information has become popular 
among both health care professionals and patients. Pew Institutes national surveys since 2003 show that 
over 80% of online users in the United States look for advice or information about health or health care 
(Pew Internet and American Life Project, 2003-2011) despite 33% increase in Web users (Internet News, 
2003; Internet World Stats, 2012). 

Although online health care information is widely accessed, the quality of health care information 
on the web is varied in terms of accuracy, coverage and currency (Eysenbach et al. 2002; Kunset et al. 2002; 
Griffiths & Christensen, 2005).Surprisingly, information consumers themselves make little effort to verify 
the information(Eysenbach & Köhler, 2004, Pew Internet and American Life Project, 2006). Misinformation 
on the web could cause and indeed has caused life-threatening accidents (Crocco et al, 2002; Kiley, 2002). 
Because of the potential harm that may be caused by inaccurate information, the quality assessment of 
health care information on the web stays a common interest of various health care information stakeholders, 
including e-health policy makers, information providers/consumers, and information search service 
providers. 

The large amount of health care information on the web can easily overwhelm the capacity of any 
manual evaluation system. Automated quality assessment mechanism is therefore in need. In particular, we 
are interested in quality assessment based on evidence-based health care guidelines. 


iConference 2014 Yanjun Zhang et al. 


Evidence-based medicine has been advocated in health care since the original model was presented 
in the Journal of the American Medical Association (EBMWG, 1992). Many evidence-based clinical practice 
guidelines have been established under the sponsorship of governmental agencies such as the Agency for 
Healthcare Research and Quality (AHRQ, 2011) in the United States. The guidelines are established based 
on the systematic review of scientific evidence in health care and medical literature by multi-disciplinary 
panel including methodologists, medical experts and scientific reviewers. 

In this paper, we propose and evaluate a supervised machine learning approach to rate the 
information quality of health care web pages based on their content. In our approach, quality of health care 
web pages is assessed through semantically comparing the text content with evidence-based health care 
practice guidelines. Content accuracy of a web page is assessed through looking for positive matching 
between the web page content and any of the health care guidelines; the content coverage is assessed by 
identifying the number of guidelines covered by this web page. According to a systematic review (Eysenbach 
et al., 2002) based on 79 distinct health information quality evaluation studies, accuracy and coverage are 
two most commonly used measures for assessing information quality from the content perspective. We have 
developed two semantic-based quality rating approaches: one is a knowledge engineering based approach, 
and the other is a supervised learning method. The former is reported elsewhere (Zhang et al., 2013).We 
report the latter in this paper. As a case study to proof the concept, the health care guidelines on depression 
treatment (Appendix C) are used in this research. 

This paper is structured as follows. After describing the method, data, and experimental design, we 
report evaluation results and discuss the results and limitations of the research. A comparison of this work 
with other web information quality assessment research is then presented. We conclude the paper with 
plans for future work. 


2 Method 


We cast the quality rating problem as a sentence classification problem. If a sentence in a web page is 
consistent with a depression treatment guideline, it is considered as a match (i.e. “positive”) and the quality 
score of the web page is increased by 1. A Naive Bayes (NB) classifier is trained to perform the classification 
on a semantic representation of the sentences. 


Naive Bayes Classifier: A NaiveBayes classifier is a supervised classification algorithm. The instance to 
be classified (i.e., test instances) are represented as a vector of features. The training instances are the 
vectors with class labels, which in our case are “positive” and “negative”. Through a training process, a 
Naive Bayes classifier constructs a probabilistic model that can be used to classify new input instances. The 
training and test instances are sentences from web pages, and each sentence is represented as a set of 
features (also called “attributes”). When constructing the probability model, a Naive Bayes classifier 
assumes that all features are independent of each other given the context of the class (i.e., “conditional 
independence”). Because of the independence assumption, the parameters for each feature can be learned 
separately, and this greatly simplifies learning, making the algorithm efficient on large feature spaces. The 
Naive Bayes model has many variations. In this study, we used the multivariate Bernoulli event model that 
is implemented in WEKA (Witten et al., 2011). The same model has been used for text classification in 
numerous studies (McCallum & Nigam, 1998; Billsus & Pazzani, 1999; Schneider, 2003; Chen et. al., 2009). 


Feature Space Construction: Like many other supervised classification applications, selecting the 
features to represent the instances for Naïve Bayes classifier is arguably the most critical aspect of the 
algorithm design. Our feature space construction module carries out the following procedure to construct 
the feature space: 
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e Semantic tagging of original sentences. This step transforms original sentences to vectors of semantic 
tags. These semantic tags are candidate features. 
e Feature space reduction. 
o Remove semantic tags representing numerical values or web URLs (i.e., tags containing 
“www”), as they are not relevant to quality assessment (See Table 1 for some examples). 


o Further prune the feature dimensions in a vector space. 


Original Text (underlined part) Semantic Tag Result (enclosed 

by square bracket) POS Tag 
telephone:805-967-7636 [805], [967], [7636] number 
by the 10th postnatal day [10°] number 
http://www.healthlinkbc.ca/kbase/ [a unknown 
as/tb1939/actionset.htm actionsetcahealthlinkbchtmkba 


se tb1939 www] 


Table 1: Examples of noisy tag to be removed 


Although the semantic representation of a sentence includes three types of data: semantic tags, POS tags, 
and term positions, this study only uses semantic tags to form the feature space for the Naive Bayes 
classifier. Other data will be utilized in future studies to improve classification performance. Next, we 
describe the semantic tagging and feature dimension reduction in vector space in detail. 


Semantic Tagging of Sentences: A naive application of NB is often applied on text represented as “bag 
of words”. In this approach, instead of using the original sentences, a semantic representation of a sentence 
is created by using three tools, MetaMap API, TaggerClient (v2.4.c) and Lexical Variant Generator (LVG) 
(McCray et al. 1994; National Library of Medicine, 2009). Semantic tagging is carried out after a sentence 
cleaning process where citation notations and other extra text and symbols are removed. 

MetaMap API tags nouns and noun phrases with their matching semantic concepts in UMLS 
Metathesaurus (UMLS, 2009). Each successful match comes with a confidence score. For example the nouns 
“depression” and “depressive illness” are both mapped to a semantic concept “depressive disorder” with a 
score of 1000. TaggerClient is a part of speech tagger for biomedical domain. LVG is a lemmatization tool. 
Both are used to control lexical variations and map a word to a preferred synonym. For instance, “ceases” 
and “cease” both have the preferred synonym “stop”. 

As MetaMap API focuses on medical concepts expressed as nouns and noun phrases, TaggerClient 
and LVG are used in addition to control the lexical variations of verbs, adjective, adverbs, and the nouns 
that are not tagged by MetaMap API. LVG tags of other word groups such as articles, prepositions, etc. 
are relatively less useful for shallow semantic analysis purposes and hence are ignored in this study. In 
addition, when a noun or noun phrase has both a MetaMap tag and a LVG tag, the MetaMap tag is taken 
only when the confidence score is greater than 850 (set empirically); otherwise, the LVG tag is taken as the 
semantic tag. After semantic tagging, a sentence can be transformed from its original textual representation 
to a vector of semantic tags. 

Feature Dimension Reduction in a Vector Space: We observe that a depression treatment web page 
may discuss other aspects of depression, for example, causes, self-diagnosis, research groups, or useful 
resources. Features associated with these aspects are less useful for identifying positive sentences. Further, 
due to the content variations across web pages, these features also more likely to have a value of 0 in many 
sentence vectors, making these features also less useful for identifying negative sentences. Removing these 
dimensions would address the sparse feature space problem, improve parameter estimation in Naive Bayes 
classifiers (Kim et. al., 2005), and reduce computational cost. 
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The goal of feature dimension reduction is to identify and remove these less useful features. This is 
achieved through the following measures: 


a) Project vectors of training instances onto a multi-dimensional vector space. The number of 
dimensions of the vector space is determined by the number of unique semantic tags in all the 
sentence vectors after the removal of numerical and web URL tags. 

b) For each sentence vector, measure its cosine similarity with all positive training instances, and take 
the maximal similarity as the score for this vector. 

c) Sentence vectors with a similarity score greater than 0.5 (threshold set empirically) are selected to 
form a reference group. Features with a value of 0 across all vectors in the reference group are 
removed. 


Feature space reduction is effective. For depression treatment guideline #1, the dimension size was reduced 
from 3963to 635. A manual review of these removed dimensions confirmed that their corresponding semantic 
concepts are semantically unrelated to depression treatment. Some typical examples of these concepts are 
cancer, clinic, university, etc. 


Quality Score of Web Pages: A Naive Bayes classifier is trained for each depression treatment guideline 
using the corresponding training examples. The training instances take the form of vectors of semantic tags. 
The number of trained classifiers equals the number of the guidelines. The trained classifiers are then applied 
on each of the test instances one by one. Test instances are in the same format as the training example, 
except that test instances do not have class labels. The classifiers will classify the test instance as either 
Positive or Negative. If it is positive, the webpage containing the sentence scores one. The quality score for 
any webpage ranges from 0 to the number of guidelines. Figure 1 shows the logic flow of the system. 


195 


iConference 2014 


Yanjun Zhang et al. 


Web document 


Read in the Semantic Under-Sampling 
text of Tagging Result of Negative 
document (Training Data) Cases* 


Clean Noisy 
Tags and Form 
Processing Semantic Tag 
Set 


reate Vector 
, Space Model & 
MTTx Tagging Transform 
Sentences into 
Semantic Vector 


Dimensionality 
LVG Tagging Reduction of 
Vector Space 
Model 


Merge 
Tagging 


Results Construct Naïve 


Bayesian Model 


Semantic 
Tagging NB Classifier 
Result 


1 Semantic Tagging 2 Naïve Bayes Classifier Construction 


* The under-sampling of negative cases is applied during the processing of guideline #1 only, i.e. randomly selecting the 
every 6" negative sentence from training web pages and use them as negative training data set. 

Purpose: Guideline #1 appears in more web pages than other guidelines. Thus, it has larger data set. The under- 
sampling of negative cases was applied to avoid out-of-memory error during the execution of WEKA (coded in JAVA). 


Figure 1: Process flow charts for semantic tagging & Naive Bayes classification 


3 Data 


The corpus for this study comprised a total of 201 web pages on the topic of depression treatment (Appendix 


A). The sample data were obtained from multiple sources (Table 2) in May 2009. For search engines, 


“depression treatment” was used as the query and the first 30 returned web pages from each search engine 


were collected as candidates. For web portals, candidate pages were collected from depression treatment 


related sections only. Candidate pages were examined manually to remove duplicate pages and pages that 


were inappropriate for other reasons (see Appendix B). In the end, 201 web pages were selected to form the 


corpus. 
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General Web Search Medical Search Health care Web Portals 
Engines Engines 
OmniMedicalSearch Medline Plus in United 
Google ina 
ae eon te > a kBC in Canad 
Microsoft Bing Search ca iTe D EUR 
MedNar HealthInsite in Australia 
Ask.com 
AOL WebMD National Health Service 


(NHS) in United Kingdom. 


Table 2: Sources for constructing the corpus 


Training and test data: Two human raters were hired to identify positive and negative sentences in the 
201 web pages. In addition, they were required to highlight in the positive sentences the key phrases that 
lead them to a positive identification. A five-hour rating workshop was held for the human raters to learn 
the evidence-based depression treatment guidelines. The intra-class correlation coefficient (ICC) between 
the webpage quality scores assigned by two raters across all guidelines was .990, with the 95% confidence 
interval between .979 and .995, as measured by the single measure ICC value i.e. ICC(3,1). The discrepancies 
were resolved through discussion between the raters. 

The quality of each of the 201 pages was rated by the human raters using the health care guidelines. 
The score of a page is the number of unique guidelines reflected by the page. The scores ranged from 0 to 
8. The 201 web pages were divided into 5 bins (i.e., those with a quality score of 0, 1-2, 3-4, 5-6 or 7-8). 
Stratified random sampling was used to select 31 test web pages (2677 sentences in total). The remaining170 
pages were used as the training set. 


Depression Treatment Guidelines 1, 6 and 12-B: This research used a subset of the 20-item evidence- 
based depression treatment guidelines used in Griffiths and Christensen (2005). Appendix C lists these 
guidelines. When a guideline contains multiple semantic propositions, it is necessary to split it into multiple 
guidelines. For instance, the guideline #12 “abrupt cessation of antidepressant can cause discontinuation 
syndrome and that antidepressants should not be stopped suddenly” says 


a) Antidepressant should not be stopped suddenly. 


b) Abrupt cessation of antidepressant can cause discontinuation syndrome. 


Since it is possible for only one of the two points to be mentioned in a sentence, guidelines like this can 
potentially cause discrepancy among human raters when creating training examples and reviewing test 
results. To avoid this problem, guideline #12 was split into guideline 12-A and 12-B. 

Due to the skewed distribution of the guidelines in the corpus, some guidelines (e.g. #18, 19) end 
up with too few positive examples (n < 5) for training the Naïve Bayes classifier. However, we were able to 
select three guidelines, #1, 6, and 12-B, of varied semantic complexity to test the proposed approach. These 
guidelines have minimum 50 positive training examples found by human raters from different web pages, 
so the positive data set size is reasonably large for training classifier. 


4 Results 


After the Naïve Bayer classifiers were trained with the 170 web pages, they were used to classify the 2677 
sentences in the 31 test pages. In this section, we report the classifiers’ sentence classification performances, 
as well as the web page quality rating performance. 

Table 3 lists the performance of the machine learning approach for each individual guideline as 
measured by precision, recall, and accuracy. The following equations are used for calculating these measures. 
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TP stands for the number of true positive (cases) identified by classifier; FP stands for false positives 
identified by classifier; FN stands for false negatives identified by classifier; and TN stands for true negatives 
identified by classifier. 


Equation 1: Precision = the proportion of true positives (TP) over tested positives 
= TP / (TP + FP) 
Equation 2: Recall = the proportion of true positives (TP) over actually positives 
= TP / (TP + FN) 
Equation 3: Accuracy = the proportion of correctly identified sentences over all sentences 


= (TP + TN) / (TP + FN + FP + TN) 


5 Machine Machine 
Depression Human . . 
_. Learning Learning af 
treatment Classifi ae ass ae Recall Precision Accuracy 
ane i Classification Classification 
guideline -cation 
Y) (N) 
Y 42 7 
1 ; 13. A 
# N 263 2365 85.7% 3.7% 89.9% 
Y 16 3 
6 84.2 76.2 99.7 
# N 5 2653 a a n 
Y 11 2 
#12-B 84.6% 28.9% 98.9% 
N 27 2637 


Table 3: Performance of sentence classification by machine learning approach 


Quality Score via Quality Score Difference 
Testing Page ID Quality Score via Human Rating Machine Learning (machine learning vs. 
Rating human rating)* 
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20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
Total 44 55 Not Applicable 
Note: The quality score was assigned based on guideline #1, #6, and #12-B only. 
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* The quality score difference = quality score via machine learning - quality score via human rating. 


Table 4: Quality score assigned to testing web pages for guideline #1, #6, and #12-B 


The quality scores generated by the machine learning approach using guidelines#1, 6 and 12-B are listed 
in Table 4. The quality scores range from 0 to 3 (i.e., a page may match 0 to 3 guidelines), and a total of 
55 occurrences of the guidelines were identified from the 31 test pages. Figure 2 shows the overlap between 
the machine learning results and human rating results. 


11 guidelines (20%) 
0 guideline missed by identified by machine 
machine learning rating learning rating approach 
approach were not accepted by 
human raters 


Figure 2: Identified depression treatment guidelines (#1, #6, and #12-B) 


The linear correlation between machine learning based quality scores and the evidence-based human rating 
quality scores was high and statistically significant (r = 0.841, r2 = 0.707, p < .001, n = 31, see Figure 3). 
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The Relationship between Machine Learning Rating Score 
and Evidence-based Human Rating Score 
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* The number of duplicate points 


Figure 3: Relationship between machine learning quality rating scores and human rating quality scores 


5 Discussion 


The sentence classification results (Table 3) show that for all three guidelines, the recalls are above 84%. 
This suggests that the machine learning approach can effectively identify the sentences reflecting these 
guidelines, despite the variations in natural language guideline expressions. Fairly complex sentences such 
as those presented in Figure 4 were correctly identified as positive in different test pages. 
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Guideline #1: 
“Antidepressant medication is an effective treatment for major depressive 
disorder.” 

Matched sentence: 


l. SSRIs affect mainly serotonin and have been found to be effective in 
treating depression and anxiety without as many side effects as some older 
antidepressants. 


Guideline #6: 


“The side effect profile varies for different antidepressants.” 


Matched sentences: 


2. SSRIs and SNRIs are more popular than the older classes of antidepressants, 
such as tricyclics-named for their chemical structure-and monoamine oxidase 
inhibitors (MAOIs) because they tend to have fewer side effects. 

Side effects may vary depending on the medicine you take, but common ones 
include stomach upset, loss of appetite, diarrhea, feeling anxious or on 
edge, sleep problems, drowsiness, loss of sexual desire, and headaches. 
However, because TCAs tend to have more numerous and more severe side 
effects, they're often not used until you've tried SSRIs first without an 
improvement in your depression. 

The side effects vary depending on the type of antidepressant you take. 


Figure 4: Examples of correctly classified sentences 


These cases demonstrate that the machine learning system is able to successfully map text expressions to 
semantic concepts, including “SSRI” - “antidepressant”, “treating” - “treat”, and “depression” - “major 
depressive disorder”. In addition, these examples include two different ways for expressing the meaning of 
guideline #6. One says directly that side effects “vary” depending on antidepressants; the other one 
indicates variation by a discussion of “fewer/more” side effects between antidepressants. In both cases, the 
machine learning approach successfully identified that the sentences are in concordance with the rating 
guideline #6. 

While all the guideline occurrences identified by human raters were successfully identified by the 
machine learning approach, the machine learning approach identified 11 false positive occurrences (Figure 
2). False positive errors may occur when the semantics of a text segment is taken for the entire sentence. 
Because the sentence contains “your response to certain antidepressant”, the classifier mistakenly classified 
the sentence in Figure 5 as a match for guideline no. 1. This partially explains the low precision. 

Another limitation of the current implementation is that it uses individual sentences as the 
processing unit. Because of this, guidelines that are expressed across multiple sentences or as part of a bullet 
list in a web page are challenging to the Naive Bayes classifiers. Although these cases did not occur in this 
experiment we acknowledge that they can lead to false negative results. 


Guideline #2: 


“Antidepressant medication is an effective treatment for major depressive 
disorder.” 


False Positive Match: 


10. The test, called the cytochrome P450, helps pinpoint genetic factors that 
influence your response to certain antidepressants {as well as some other 
medications). 


Figure 5: A false positive example 
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The low precision and imperfect recall of sentence classification did not seem to have greatly affected the 
page rating performance. This is because a single guideline is commonly paraphrased more than once in a 
web page. So, suppose the machine identified five sentences as a positive instance of a guideline, as long as 
one of them is a true criterion, the impact of false positives would not been reflected in the machine-assigned 
quality rating score. Table 4 shows that for 21 of the 31 pages (67.7%) the machine learning quality scores 
and the human rating quality scores were identical. In 9 pages (29.0%) the machine learning quality scores 
were one higher than the human rating quality scores. For one page (3.2%) the difference between the 
machine learning and human rating quality scores was greater than one (2 in test page #28). The high 
linear correlation between human and machine ratings suggests that the proposed approach has the 
potential to evaluate the quality of online health care information (Figure 3). 


6 Literature Review 


Much research has been done studying quality dimensions of web health care information (e.g. Bopp & 
Smith, 2000), establishing quality rating codes and indicators (e.g. Eysenbach et al., 2002; Griffiths & 
Christensen, 2002; URAC, 2007), and exploring automated rating mechanisms (Griffiths et al., 2005; 
Hawking et al., 2007; Wang & Liu, 2007). 

Earlier studies explored using accountability metadata of web pages such as web page authorship, 
site sponsorship, and disclosure of editorial board of web site as quality indicators(e.g. Chen et al., 2000; 
Smith, 2002; Barnes et al., 2009, etc.). This type of indicators asks the questions of how the information 
was presented or what meta-information was provided (Eysenbach et al., 2002). The correlation between 
information quality and web site/document linkage patterns such as hyperlinks (e.g. in-link counts to a 
website and Google’s page rank of site home page) has also been explored (e.g. Frické et al., 2005; Griffiths 
& Christensen, 2005).These quality indicators seem to be domain independent; however, researchers have 
found that the association between these indicators and the content quality of web health care information 
is inconsistent in different health care subjects, putting the validity and reliability of these non-content 
based indicators in question (e.g. Frické & Fallis, 2002; Frické et al., 2005; Griffiths et al., 2005).The major 
difference between our work from these studies is that our approach assesses the quality of information 
content directly and not through secondary attributes such as the disclosure of authorship and domains. 
The content-based quality rating fills the gap in information quality assessment and complements other 
rating criteria nicely. 

The work reported here was inspired by Griffiths and Christensen (2002), where the authors 
adopted a set of evidence-based depression treatment rules published by the Centre for Evidence-based 
Mental Health at Oxford (CEBMH, 1998) as the quality rating standard in their study. In their study, 
human raters used this standard to rate the quality of depression websites. The quality of a website was 
measured by the number of different treatment rules reflected in the website content. The greater the 
number, the higher the quality score a site was assigned. Griffiths and Christensen (2002) suggested that 
the rating scores generated using evidence-based treatment guidelines were highly correlated (r=0.96, 
p<.001) with the quality scores of subjective rating completed by health professionals (Griffiths & 
Christensen, 2002).Using the same set of guidelines, Griffiths and Christensen (2005) proposed an automated 
(website) quality rating approach based on information retrieval techniques. 

Our work differs from Griffiths & Christensen (2005) in that 1) our quality rating is implemented 
based on matching text semantics while theirs is on keyword distribution and 2) our analysis is conducted 
at sentence level with a finer granularity than web document level. Because of the use of semantics, our 
approach is able to identify specific health care guidelines as presented in a web page. Consequently, the 
quality scores assigned to web pages can not only support ranking, but also justify the scoring results in a 
user understandable manner. In addition to quality score, it is also important that the clear indication of 


which exact health care guidelines are presented in web pages can be extra assistance for information 
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consumers. They can select their reading based on not only quality score, but also the extent to which that 
the web page content fit the users’ individual information need (e.g. focusing on medication treatment 
instead of psychotherapy) and knowledge background. Looking into the future, the semantic information 
extracted can contribute to the Linked Data clouds and the Semantic Web. 


7 Conclusion and Future Research 


We reported a prototype Naive Bayes based machine learning system that rates the quality of health care 
web pages according to the evidence-based health care guidelines. As a proof of concept, we used depression 
treatment for a case study. 

Our experimental results suggest that the semantics-based quality rating approach can produce 
quality score results comparable to human rating results. This is achieved by having computer programs to 
conduct shallow semantic analysis on each sentence in depression treatment web pages, and then use the 
semantic tags of training sentences to develop classifiers’ capability to identify the sentences that are in 
concordance with the health care guidelines. The identification of guideline-conforming sentences is treated 
as a binary classification problem. The classification and page rating performances (Tables 3 and 4, Figure 
3) attest to the effectiveness of the automatic quality rating approach. 

There are some limitations of this study. First, the automated quality rating system is tested for 
only 3 of the 20 guidelines for the evidence-based depression treatment. It would be interesting to testify 
the performance of the semantics-based approach by using more guidelines for quality rating. Secondly, due 
to the limit of research resources, the size of training and testing data set is relatively small to allow the 
study on this prototype system controllable. A wider scope of dataset, ideally also including other health 
care subjects, would further help proving the generalizability of the semantics-based approach. Additionally, 
the classifier performance can be evaluated with a more robust method such as cross-validation. 

In the future, other than overcoming the above limitations, we will further improve the sentence 
classification performance, especially for the low precision. Some sensitivity analysis can be done for 
optimizing trade-off between precision and recall. In the health care quality rating context, a practical 
classifier probably need to be designed to be more conservative as false positives may cause worse 
consequence than false positives, e.g. leading someone to trust a low quality web page. In future design, the 
classifiers do not necessarily take the semantic components only into account, but also apply some 
constraints to screen out negative cases. For example, adding proximity constraints between a pair of 
semantic tags and taking into consideration of negations in those anti-guideline cases. These may be added 
to the feature space of the classifiers or used in other ways to reduce false positives. We will also work to 
identify statements that contradict with approved health care guidelines. 
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11 Appendix A: The corpus: 201 web pages 


ID URL 
1 http: //www.mayoclinic.com/health/depression/DS00175/DSECTION=treatments-and-drugs 
2 http://www.emedicinehealth.com/depression/page5_em.htm 
3 http://www.emedicinehealth.com/depression/page6_em.htm 
4 http://www.emedicinehealth.com/depression/page7_em.htm 
5 http://www.finddepressiontreatment.com/ 
http://health.usnews.com/articles/health/healthday /2009/05/27/stigma-keeps-teens-from- 
6 depression-treatment.html 
7 ~— http://www.mentalhealth.com/rx/p23-md01.html 
8  http://www.effexorxr.com/depression-anxiety-treatment.aspx 
9  http://www.waldenbehavioralcare.com/depression_treatment.asp 
10 http://www.healthlinkbc.ca/kbase/topic/special /hw30709/sec1.htm 
11 http://www.depressioncenter.org/treatments/cbt.asp 
12 http://www.depressioncenter.org/treatments/meds.asp 
13 http://www.depressioncenter.org/treatments/default.asp 
14 http://www.hypnosisdownloads.com/cat/depression-treatment.html 
15 http://www.medpagetoday.com/Psychiatry /Depression/14476 
16 http://www.nlm.nih.gov/medlineplus/ency /article/003213-htm 
17 http://www.healthlinkbc.ca/kbase/topic/special/hw30709/sec9.htm 
18 — http://www.healthlinkbc.ca/kbase/topic/special/hw30709/sec10.htm 
19 http://www.webmd.com/anxiety-panic/feat ures /alternative-depression-treatment-risks 
20 http: //en.wikipedia.org/wiki/Clinical_depression 
21 http://www.mayoclinic.org/depression/treatment. html 
22 ~~ http://www.healthlinkbc.ca/kbase/topic/special/hw30709/sec11.htm 
23 http://www.sciencedaily.com/releases/2008/11/081130201928.htm 
24 http: //online.wsj.com/article/SB119128055574245655.html?mod=health_home_ stories 
25 http: //www.medicalnewstoday.com/articles/81578.php 
26 ~— http://www.healthlinkbc.ca/kbase/topic/special/hw30709/sec12.htm 
27 ~~ http://www.depression-guide.com/ 
28 ~~ http://psychologyinfo.com/depression/treatment.htm 
29 http://www.iampanicked.com/anxiety-articles/depression-treatment-methods.htm 
30 http://www.nimh.nih.gov/health/publications/depression/complete-index.shtml 
31 http://depressionandanxietyhelp.com/depression-treatment.html 
32 http://www.helpguide.org/mental/medications_depression.htm 
33 http://helpguide.org/mental/treatment_strategies_depression.htm 
34 http://safedepressiontreatment.com/ 
35 http://depressiontreatmentworks.org/ 
36 http://thedepressiontreatment.com/antidepressants/index.htm 
37 http://www.depression.com/treatment__tips.html 
38 — http://depression-assistance.com/2006/07/30/depression-treatment / 
39 http://depressiontreatment.net/ 
40 http://www.depression-guide.com/treatment-of-depression.htm 
41 http://www.depression-help-treatment.com/depression-medication.html 
42  http://www.depression-treatment-help.com/depression-treatment /depression-treatment.htm 
43 http://www.depression-treatment-help.com/ 
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http://health.yahoo.com/depression-treatment /depression-treatment-overview /healthwise-- 
aa25747.html 

http://www.depressiontreatmenthelp.org/depression_treatment.php 
http://www.psychologyinfo.com/depression 
www.psychologyinfo.com/depression/treatment.htm 

http: //mayoclinic.com/health/depression/DS00175 /DSECTION=treatments-and-drugs 
http://psychcentral.com/lib/2006/depression-treatment /all/1/ 

http: //www.drweil.com/drw/u/ART00696/depression-treatment 
http://www.mayoclinic.com/health/treatment-resistant-depression/DN00016 

http://www. healthlinkbc.ca/kbase/dp/topic/ty6745/dp-htm 

http: //en. wikipedia.org /wiki/Depression_and_natural_therapies 

http: //au.reachout.com/find/articles /depression-management-and-treatment-options 

http: //www.aboutourkids.org/families/disorders_treatments/az_disorder_guide/depression/treatment 
http://www.cancer.gov/cancertopics/pdq/supportivecare/depression/Patient /page4 
http://www.cancer.gov/cancertopics/pdq/supportivecare/depression/Patient/91.cdr 

http: //www.healthlinkbc.ca/kbase/topic/detail/drug/hw29716/detail.htm 
http://www.nimh.nih.gov/health/topics /child-and-adolescent-mental-health/antidepressant- 
medications-for-children-and-adolescents-information-for-parents-and-caregivers.shtml 

http: //www.aboutourkids.org/families/disorders_treatments/az_disorder_guide/depression/q 
uestions answers 

http://www. healthline.com/adamcontent /adolescent-depression 

http: //www.healthlinkbc.ca/kbase/topic/detail/drug/hw29398 /detail.htm 

http://www. healthline.com/adamcontent /depression-elderly 

http://www. healthline.com/adamcontent /major-depression-with-psychotic-features 
http://www.mayoclinic.com/health/depression-treatment /AN00685 
http://www.omhrc.gov/templates /news.aspx?ID=627661 

http: //www.aboutourkids.org/families/disorders_treatments/az_disorder_guide/depression/t 
reatment 

http://www.nlm.nih.gov/medlineplus/news/fullstory__82699.html 
http://www.ahrq.gov/clinic/epcsums/deprsumm.htm 

http: //www.oas.samhsa.gov/2k8/depression/depressionT X.cfm 
http://mednar.com/mednar//mednar /link.html?collectionCode=HEL- 
IMPRO&searchld=fdf05ca7-40a4-4el 8-af8e- 
3cdfe82088cek&type=RESULT_EMAIL&redirect Url=https%3A%2F %2F www.acponline.org%2 


Fatpro%2F timssnet %2Fimages%2F books%2Fsample%2520chapters%2F PsychCh05.pdf 
http://www.nlm.nih.gov/medlineplus/news/fullstory_ 82699.html 


http: //www.healthlinkbc.ca/kbase/topic/detail/drug/hw29806/detail.htm 
http: //www.mayoclinic.com/health/alternative-medicine-side-effects /MY00682 
http: //www.healthlinkbc.ca/kbase/topic/detail/drug/hw29535 /detail.htm 
http://www. healthlinkbc.ca/kbase/dp/topic/zx3018/dp.htm 
http://www.mayoclinic.com/health/depression-and-aging/MY00259 
http://www. healthline.com/adamcontent /major-depression 


http://www.healthline.com/galecontent /depression-1 

http: //www.medicinenet.com/script/main/art.asp?articlekey=52498 
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http: //familydoctor.org/online/famdocen/home/common/mentalhealth/treatment /012.html 
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http: / /netwellness.org/healthtopics /depression/depressiontreatment.cfm 
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http://www.med.umich.edu/depression/treatment.htm 
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http://www.webmd.com/depression/understanding-depression-treatment 
http://www.webmd.com/depression/guide/depression-treatment-options 
http://www.webmd.com/depression/postpartum-depression /understanding-postpartum- 
depression-treatment 

http://www.webmd.com/depression/psychotherapy-treat-depression 
http://www.webmd.com/depression/treating-depression-medication 
http://www.webmd.com/depression/pediatric-prozac 
http://www.webmd.com/depression/news/20080221 /hope-may-take-time-after-depression 
http://www.webmd.com/depression/continuum-care-treatment-resistant-depression 
http://www.webmd.com/depression/guide/treatment-resistant-depression-psychotherapy 
http://www.webmd.com/depression/experimental-treatments-depression 

http: //www.healthlinkbc.ca/kbase/dp/topic/ty6886/dp.htm 

http://sh- 

print. healthwise.net /moh/print/PrintTableOfContents.aspx?token=moh&localization=en- 
http://sh- 
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http: //www.healthlinkbc.ca/kbase/topic/detail/drug/zp2718/detail.htm 
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http://www.webmd.com/depression/medication-options 

http: //www.webmd.com/depression/news/20081201/which-kids-need-antidepressants 
http://www.webmd.com/depression/ssris-myths-and-facts-about-antidepressants 

http://www. healthlinkbc.ca/kbase/topic/major /tn9653/trtover.htm 
http://www.webmd.com/depression /news / 20080303 /fda-oks-new-antidepressant-pristiq 
http: //www.healthlinkbc.ca/kbase/topic/major /tn9653/drugtrt.htm 
http://www.webmd.com/depression/guide/optimizing-depression-medicines 
http://www.webmd.com/depression/guide/chronic-illnesses-depression 

http: //www.healthlinkbc.ca/kbase/topic/major /tn9653/othertrt.htm 

http://www. healthlinkbc.ca/kbase/topic/major /tn9653/hometrt.htm 
http://www.webmd.com/depression/guide/depresssion-support 
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http: //www.healthlinkbc.ca/kbase/topic/detail/drug/tn9670/detail-htm 

http: //www.healthlinkbc.ca/kbase/topic/detail/drug/tn9677 /detail-htm 
http://www.webmd.com/depression/adjusting-life-recovery 
http://www.webmd.com/depression/guide/st-johns-wort 

http: //www.webmd.com/depression/guide/alternative-therapies-depression 
http://www.webmd.com/depression /news/20080226 /therapy-medication-switch-for-teen-depression 
http://www.webmd.com/depression/guide/sexual-problems-and-depression 
http://www.anxiety-and-depression- 
solutions.com/wellness_concerns/depression/depression_treatment.php 
http://www.zoloft.com/depr_treatment.aspx 

http://www.wdxcyber.com/psychotherapy.html 
http://depression.emedtv.com/depression/depression-treatment. html 
http://www.antidepressantsfacts.com/1995-12-Antonuccio-therapy-vs-med.htm 

http: //www.genf20.com/depression-treatment.html 

http://depressiontreatment.net.au/ 

http://www. bayridgetreatmentcenter.com/depression.html 
http://www.bodyhealthsoul.com/depression.htm 
http://www.ayushveda.com/health/depression.htm 

http: //health.yahoo.com/depression-treatment /should-i-take-medications-to-treat- 

depression /healthwise--ty6745-html;_ ylt=AkghCk5Z4QCEPGEI1CGHvw_EtcUF 
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http://www.nlm.nih.gov/medlineplus/antidepressants. html 

http: //effectivehealthcare.ahrq.gov/healthInfo.cfm?infotype=sg& DocID=10&ProcessID=7 
http://familydoctor.org/online/famdocen/home/common/mentalhealth/treatment/012.printerview.h 
http://www.nimh.nih.gov/health/publications/mental-health-medications/complete- 

http: //www.consumerreports.org/health/best-buy-drugs/antidepressants.htm 

http: //www.mayoclinic.com/print /antidepressants/HQ01069/METHOD=print 

http:/ /familydoctor.org/online/famdocen/home/common/mentalhealth/treatment/045.printerview.h 
http: //www.fda.gov/Drugs/DrugSafety /InformationbyDrugClass/ucm096305.htm 

http: //www.fda.gov/ForConsumers/ConsumerUpdates/ucm095980.htm 

http:/ /familydoctor.org/online/famdocen/home/common/mentalhealth/treatment/904.printerview.h 
http://www.nimh.nih.gov/health/topics /child-and-adolescent-mental-health/antidepressant- 
medications-for-children-and-adolescents-information-for-parents-and-caregivers.shtml 
http://www.mayoclinic.com/print /antidepressants/MH00059/METHOD=print 
http://www.nimh.nih.gov/health/publications/depression-easy-to-read/index.shtml 

http: //www.nlm.nih.gov/medlineplus/tutorials/depression/mh019103.pdf 
http://womenshealth.gov/faq/depression.cfm 

http: //www.fda.gov/ForConsumers/ByAudience/ForWomen/ucm118515.htm 

http: //www.nimh.nih.gov/health/publications/depression/how-is-depression-detected-and-treated.shtml 
http: //www.mayoclinic.com/print /depression/DS00175 /DSECTION=all& METHOD=print 
http: //jama.ama-assn.org/cgi/reprint /300/18/2202.pdf 

http:/ /familydoctor.org/online/famdocen/home/common/mentalhealth/treatment/882.printerview.h 
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169 http://www.healthlinkbe.ca/kbase/as/tb1939/what.htm 

170 http://apahelpcenter.org/articles/article.php?id=52 

171 http://www.mayoclinic.com/print /psychotherapy /MY00186/METHOD=print&DSECTION= 

172 http://www.healthlinkbc.ca/kbase/nci/ncicdr0000062806.htm 

173 http://nccam.nih.gov/health/stjohnswort /sjw-and-depression.htm 

174 http://www.mayoclinic.com/print /depression-and-exercise/MH00043/METHOD=print 

175 http://www. intelihealth.com/IH /ihtIH /WSIHW000/8596 /35226 /363129.html?d=dmtContent 

176 http://www.mayoclinic.com/print /clinical-depression/AN01057/METHOD=print 

177 ~~ http://mentalhealth.samhsa.gov /publications /allpubs/ken98-0049 /default.asp 
http: //www.lupus.org/webmodules/webarticlesnet /templates/new_learnliving.aspx?articleid= 

178 2256&zoneid=527 

179  http://www.annals.org/cgi/reprint/149/10/1-56.pdf 

180 http://www.nia.nih.gov/HealthInformation/Publications/depression.htm 
http://www.nimh.nih.gov/health/publications/older-adults-depression-and-suicide-facts-fact- 

181 sheet /index.shtml 
http://www.nimh.nih.gov/health/publications/women-and-depression-discovering-hope/how- 

182 _ is-depression-diagnosed-and-treated.shtml 

183 http://www.healthyminds.org/Document-Library /Brochure-Library /Lets-Talk-Facts- 

184 http://www.mentalhealthamerica.net /go/information/get-info /depression/depression-in-teens 

185 http://www.mayoclinic.com/print /depression-treatment / AN00685/METHOD=print 

186 http://www.nimh.nih.gov/health/publications/treatment-of-children-with-mental- 

187 http://kidshealth.org/parent /emotions/feelings/understanding_depression.html 

188 http://www.nlm.nih.gov/medlineplus/ency /article/000945.htm 

189  http://www.blackdoginstitute.org.au/public/depression/treatments/psychological.cfm 

190 http://www.blackdoginstitute.org.au/public/depression/treatments /index.cfm 

191 http://www.healthlinkbc.ca/kbase/as/ug4845 /actionset.htm 

192 http://www.betterhealth.vic.gov.au/bhev2/bhcarticles.nsf/pages/Depression_and_ exercise 

193 http://www.betterhealth.vic.gov.au/bhcv2/bhcarticles.nsf/pages/Depression_coping _and_recoveri 

194  http://www.healthlinkbc.ca/kbase/as/ug4814/actionset.htm 

195 http://www.healthlinkbe.ca/kbase/as/tn9165/actionset.htm 

196 http://www.healthlinkbc.ca/kbase/nci/ncicdr0000062739.htm 

197 http://www.depressionservices.org.au/treatments/exercise-2.html 

198 http://www.healthlinkbe.ca/kbase/as/uf9919/actionset.htm 

199  http://www.nhs.uk/pathways/depression/pages/treatment.aspx 

200 http://www-nhs.uk/conditions/postnataldepression/pages/treatment.aspx 

201 — http://www.nhs.uk/conditions/depression/pages/treatment.aspx 


Table A: URL of depression treatment web page samples 


12 Appendix B: Criteria for Sampling Depression Treatment Web Pages 
Web pages of the following nature are removed 


- pages which focus on other diseases instead of depression, or pages that address depression, but 
discuss only non-treatment topics such as diagnosis; --- determination was based on document 
heading & sub-heading. 

- pages protected by password. 

- pages not in text format (e.g. video/audio clips, PPT slides). 

- pages with tables or spread sheets as major part of page content. 
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portal pages which do not have their own content, but just hyperlinks referring to other relevant 
pages. (e.g. URL menus/categories, list of search returns from search engines) 

pages which have article titles or bibliographic information only 

home pages of business or organizations (e.g. medical center or depression clinic) 

pages too long for human rating (e.g. online books or chapters) --- they were filtered out due to the 
consideration of human rating expense. 

advertisement pages which do not really provide depression treatment content, such as Amazon 
book advertisement 

academic articles which are targeted for professional audience instead of public online users --- due 
to academic complexity, some very specific research questions and terminologies can make the 


articles not quite understandable for most common users and human raters to conduct rating. 


13 Appendix C: Evidence-based Depression Treatment Guidelines 
Evidence-based Depression Treatment Guidelines Used in (Griffiths & Christensen, 2005) 


Evidence-Based Rating Scale for Human Raters (Copied from (Griffiths & Christensen, 2005)) 


The evidence-based rating scale was developed from statements in the treatment section of A systematic 


guide for the management of depression in primary care published by the Centre for Evidence-based 
mental health, Oxford (CEBMH, 1998). 


eh oe: Oe (ee ber 


12. 


13. 


14. 
15. 


Antidepressant medication is an effective treatment for major depressive disorder. 
Antidepressants are all equally effective. 

The effectiveness of antidepressants is around 50 to 60%. 

Full psychosocial recovery can take several months. 

Drop out rate is same for different antidepressants. 

The side effect profile varies for different antidepressants. 

The choice of antidepressant should depend on individual patient factors (e.g. presence of co- 
morbid psychiatric or medical conditions, previous response to a particular drug, patient 
preference regarding the desirability of specific side-effects, concurrent drug therapy, suicidal risk) 
Antidepressants are not addictive. 

A trial of 6 weeks at full dose is needed before a drug can be considered to have failed and 
another tried. 


. A second-line drug should probably be from a different class of antidepressant. 
11. 


Once improved continuation treatment at the same dose for at least 4-6 months should be 
considered. 

Discontinuation syndrome may occur with abrupt cessation of any antidepressant so 
antidepressants should not be stopped suddenly. Where possible antidepressants should be 
withdrawn over a 4 week period, unless there are urgent medical reasons to stop the drug more 
rapidly. [To score 1, need to make general points that abrupt cessation can cause discontinuation 
syndrome and that antidepressants should not be stopped suddenly]* 

St John's Wort appears to be as effective as tricyclic antidepressants and causes fewer side effects, 
but little is known about any long term adverse effects.** 

Cognitive therapy can be an effective treatment for depression. 

Cognitive behaviour therapy is at least as effective as drug treatment in mild-to-moderate 
depression. 
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16. Cognitive behaviour therapy may be valuable for people who respond to the concept of Cognitive 
behaviour therapy, prefer psychological to antidepressant treatment, have not responded to 
antidepressant therapy. [Score 1 if mention at least one of these] 

17. Problem-solving may be effective for depression. 

18. [Generic] counselling is probably no more effective than treatment as usual from the GP for 
depression. 

19. Written information (usually based on a cognitive model of depression) can improve mild-to- 
moderate depression. [Score 1 if cognitive model] 

20. Exercise can be effective - alone or as an adjunct to other treatments. 


For each item, score 1 if the site information is consistent with the statement. Cumulate item scores 


across the scale to yield a total evidence-based score for the site. 


*  ** Guidelines 12 and 13 each containsmultiple “meaning pieces”.They are split into multiple guidelines 
in this study (see reasons in the article body). 

12-A. Antidepressants should not be stopped suddenly. 

12-B. Abrupt cessation can cause discontinuation syndrome. 


13-A. St John’s Wort appears to be as effective as (tricyclic) antidepressants. 


13-B. St John’s Wort causes fewer side effects than (tricyclic) antidepressants. 
13-C. Little is known about any long term adverse effects of St John’s Wort. 
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Exploring Data Quality in Games With a Purpose 
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Abstract 

A key problem for crowd-sourcing systems is motivating contributions from participants and ensuring 
the quality of these contributions. Games have been suggested as a motivational approach to encourage 
contribution, but attracting participation through game play rather than scientific interest raises concerns 
about the quality of the data provided, which is particularly important when the data are to be used for 
scientific research. To assess whether these concerns are justified, we compare the quality of data obtained 
from two citizen science games, one a “gamified” version of a species classification task and one a fantasy 
game that used the classification task only as a way to advance in the game play. Surprisingly, though 
we did observe cheating in the fantasy game, data quality (i.e., classification accuracy) from participants 
in the two games was not significantly different. As well, the quality of data from short-time contributors 
was at a usable level of accuracy. These findings suggest that various approaches to gamification can be 
useful for motivating contributions to citizen science projects. 
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1 Introduction 


A key problem for volunteer-based projects is motivating contributions from participants and ensuring the 
quality of these contributions. These concerns are interrelated, in that system designs intended to maximize 
the volume of contributions may do so at the cost of quality and vice versa. 

In this paper, we examine the interplay between motivation and quality of participation in the 
context of online citizen science projects. In citizen science projects, members of the general public are 
recruited to contribute to scientific investigations. Citizen science initiatives have been undertaken to 
address a wide variety of goals, including educational outreach, community action, support for conservation 
or natural resource management, and collecting data from the physical environment for research purposes. 
Many citizen science projects rely on computer systems through which participants undertake scientific 
data collection or analysis, making them examples of social computing (Cohn, 2008; Wiggins & Crowston, 
2011). 

Citizen science projects must address concerns about the quality of contributions, in this case, 
questions that arise from suitability of the generated data for the science goals of the projects. Data quality 
is a multi-dimensional construct (Orr, 1998; Pipino, Lee, & Wang, 2002; Wang & Strong, 1996), but the 
believability or accuracy of the data remains a particular concern for citizen science projects because many 
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participants are not trained scientists. Their limited scientific knowledge may possibly affect the accuracy 
of the data they provide. 

Cognizant of this concern, previous studies have examined citizen science data quality. For example, 
Galloway et al. (2006) compared novice field observations to expert observations, finding that observations 
between the two groups were comparable with only minor differences. Goodchild and Li (2012) explored 
mechanisms for assuring the quality of citizen-provided geographic information. Delaney et al. (2008) 
checked data quality in a marine invasive species project, finding that participants were 95% accurate in 
their observations. Their study did find that motivation had an impact on the final data set, with some 
participants failing to finish because of the tedious nature of the tasks. 

This last finding is notable because citizen science projects often rely on the inherent appeal of the 
topic to attract and motivate participants. For example, “charismatic” sciences like bird watching, 
astronomy, and conservation all have existing and enthusiastic communities of interest; a number of 
successful citizen science projects have grown up around these topics. 

While the intrinsic motivation of science is undeniably powerful, citizen science projects face limits 
on their available pools of participants, namely those who share a particular scientific interest. Less 
charismatic topics of inquiry that lack a large natural base of users could benefit from alternative 
mechanisms for motivating participants. Purposeful games have the potential to become one such 
mechanism. Games are recognized for their potential to motivate and engage participants in human 
computation tasks (e.g. Deterding, Dixon, Khaled, & Nacke, 2011; Law & von Ahn, 2009; McGonigal, 2007, 
2011; von Ahn, 2006; von Ahn & Dabbish, 2008) and so seem to offer great potential for increasing the pool 
of contributors to citizen science projects and their motivation to contribute. 

Relying on games to motivate participation may have negative tradeoffs with data quality. Games 
are meant to be entertaining, and players may find themselves concentrating only on the fun elements of a 
game, ignoring, neglecting, or even cheating on embedded science tasks to get them over with quickly. 
Games that are designed to prevent such behaviors may improve data quality but not be fun for players 
and so fail to attract very many participants. 

The interrelated issues of game-driven participant engagement and citizen science data quality are 
of interest to game designers, human-computation systems designers, HCI researchers, and those involved 
with citizen science. It is important for these various constituencies to understand how citizen scientists 
produce data using games, how accurate that data can be, and how different approaches to “gamification” 


influence player motivation and data quality. In this paper, we address these questions. 


2 Theory: Gamification and Games With a Purpose 


The goal of most so-called “gamification” is to use certain enjoyable features of games to make non-game 
activities more fun than they would otherwise be (Deterding, Dixon, et al., 2011; Deterding, Sicart, Nacke, 
O'Hara, & Dixon, 2011). Often, the term gamification refers to the use of things like badges and points to 
place a “game layer” on top of real-world activities, especially in corporate, governmental, or educational 
settings. However, this usage is heavily contested by game designers and scholars, with some going so far 
as to criticize these approaches as “exploitationware” (Bogost, 2011). As Bogost (2011) and others have 
pointed out, points, badges, rewards, scores, and ranks do not really engage players; that is, they are not 
core game mechanics themselves. Rather, these are just metrics by which really meaningful interactions — 
the play experiences that truly compel and delight players — are measured and progress is recorded. To 
remove meaningful aspects of play and yet retain these measurement devices is to produce something that 
is not really a game at all (Bogost, 2011; Deterding, Dixon, et al., 2011; Deterding, Sicart, et al., 2011; Salen 
& Zimmerman, 2004). 

To conceptualize different approaches to creating games, we distinguish two different kinds of 
rewards that a game might offer, drawing on the notion of diegesis, a term from the study of film that refers 
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to the notion of the “story world” vs. the “real world” (De Freitas & Oliver, 2006; A. R. Galloway, 2006; 
Stam, Burgoyne, & Flitterman-Lewis, 1992). Diegetic rewards in games are those that have meaning within 
the game but no real value outside of it. For example, a diegetic game reward might be an upgraded weapon 
given to the player by a game character upon finishing a quest. The weapon has meaning in the game and 
is strongly tied to the story and the game world. Conversely, non-diegetic rewards are those that have only 
limited connection to the game world, but sometimes (not always) can have meaning in the real life of the 
person playing the game. For example, “achievements” (a kind of merit badge) are a common non-diegetic 
reward used in entertainment games. Players can collect achievements by performing certain actions within 
the game (e.g., “kill ten enemies in ten seconds,” or “collect 1 million coins”). Non-diegetic rewards like 
badges, points and scores are frequently used in citizen science games to acknowledge player accuracy, time 
spent, effort, or milestone accomplishments. 

Because non-diegetic rewards are weakly tied to the game world (at best) and do not deeply impact 
the game experience, players are likely to value them only to the extent that they value the actual 
accomplishments for which they are awarded. For “science enthusiast” players who truly engage with the 
scientific elements of citizen science games, non-diegetic rewards might have great significance; however it 
is possible that such players do not really need a game to motivate them in the first place. For “non- 
enthusiast” players, non-diegetic rewards likely have more limited appeal. If the real-world science activity 
itself is not highly valued, non-diegetic rewards for working on it will also not be valued. For these players, 
non-diegetic rewards are probably not an effective approach. Not surprisingly, many scholars and designers 
have become disenchanted with the typical connotation of the term “gamification,” finding it laden with 
inappropriate emphasis on performance metrics like badges and points. 

Yet non-enthusiast players are those most likely to find value in a game layer that can turn “boring 
science” into “play.” Diegetic rewards may be more engaging and more meaningful for non-enthusiasts, 
underscoring, as they can, the game story, game world, and game play instead of the real-world task. 
Diegetic rewards can thus become a powerful form of feedback to keep non-enthusiasts immersed in a game 
that only occasionally asks them to undertake a science task. The benefits for those managing citizen science 
initiatives of employing diegetic rewards- i.e., an enhanced ability to attract and engage non-enthusiast 
participants — are also apparent. 


9 06 


Many alternatives to the term “gamification” have been proposed: “games with a purpose,” “serious 
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games,” “productivity games,” “persuasive games,” and “meaningful games” (Bogost, 2011; Deterding, 
Dixon, et al., 2011; Deterding, Sicart, et al., 2011; McGonigal, 2011; Salen & Zimmerman, 2004). These 
terms describe flexible approaches to gamification where diegetic rewards are common instead of rare, and 
game designers seek to craft meaning within the game world itself. In-game money and items are simple 
examples, but more abstract rewards also qualify as diegetic, including the immersive exploration of a 
beautiful game world, the enjoyment of a rich game story, the joy of playing with fun game mechanics, or 
the player’s dialogue with game characters. Malone (1980) has noted how many of these can be motivating 
in the context of gamified experiences, specifically educational games. In this present study, we adopt von 
Ahn’s (2006) term “games with a purpose” and its variant, “purposeful games,” to distinguish diegetic 
reward approaches from non-diegetic “gamification.” In our view, these terms strongly convey the task- 
oriented nature of citizen science but also emphasize our broad view of games as entertainment media that 
should focus on engagement, play, meaning, and fun. 

Others have designed and studied entertainment games for citizen science. For example, Fold. It’ 
and Phylo’ are both citizen science projects in the form of entertaining games that have attracted substantial 
numbers of players and produced large amounts of scientific data. To date, however, there has been little 
formalized comparison of diegetic and non-diegetic rewards in gamified experiences, particular as these 


1 http://www. fold.it 
? http: //phylo.cs.mcgill.ca/ 
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relate to player performance and data quality. In particular, to our knowledge there has been no formalized 
comparison made of these different approaches using the same citizen science task as a basis for two very 
different modes of gameplay. Yet it is plausible that different reward structures and philosophies of 
gamification could impact player experience and subsequent performance independent of the task itself. For 
example, in most gamified citizen science activities, players are never allowed to stray very far from the 
tasks they are supposed to be doing. Players earn points and other rewards specifically for engaging with 
the science, and these data analysis activities comprise the majority of the game experience. Such games 
inherently place emphasis on the science, providing players with few opportunities or reasons to neglect the 
work. 

On the other hand, our understanding of diegetic rewards suggests an alternative approach whereby 
players engage with an entertainment-oriented game world that only occasionally requires them to act as a 
“citizen scientist.” In this approach, the science task becomes just one mechanic among many, and not 
necessarily the most important or compelling of the game. Though this could heighten the chances of 
attracting non-enthusiast players, it may also be that these players will ignore or neglect the science in lieu 
of playing other parts of the game, potentially reducing data quality. Even cheating — i.e., knowingly 
submitting bad data — could be beneficial to players who are fixated on the entertainment experience and 
so motivated to skip over the science work. 

To explore these issues we designed two very different games around the same purposeful activity 
in order to study the impact of different approaches to gamification on data. One game adopted a 
straightforward gamification approach, rewarding players for performance with non-diegetic score points 
and focusing primarily on the science task. The second was an entertainment-oriented purposeful game, a 
point-and-click science fiction adventure where the science task was integrated alongside many other play 
mechanics (exploration, puzzle solving, item collection, virtual gardening) and designed as a means for 
advancing in the game. Rewards in this game were diegetic, and included game money as well as the ability 
to interact with various characters, progressively explore the game world, and advance the game story. 

Game designers and HCI researchers are also likely to find our work interesting. Very few purposeful 
games have been explicitly designed as story experiences featuring diegetic reward structures, and almost 
none have been built in a design science tradition with scholarly study as a key goal of the design and 
development process (Hevner, 2007; Hevner, March, Park, & Ram, 2004; Prestopnik, 2010). Our unique 
context (citizen science) and design-based approach to study can extend our understanding of purposeful 
game design, particularly with regard to the quality of data that different game design philosophies may 
achieve. 


3 Research Questions 


We developed a guiding set of research questions with our overarching scholarly interests in mind. First, 
we wanted to know how our two games would differ in their ability to sustain participation and retain 
participants. Therefore, we address the question: 


RQ 1: How does player retention differ between a gamified task and an entertainment-oriented 
purposeful game? 


Second, the distinct reward systems and play experiences offered by our two games raised the concern that 
data quality (i.e., accuracy) might vary between the two games. If one gamification approach does indeed 
lead to measurably poorer data quality than another, that approach may be unsuitable for many kinds of 
citizen science tasks. We therefore address the question: 


RQ 2: How does the quality of data produced by players differ between a gamified task and an 
entertainment-oriented purposeful game? 
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Third, a common phenomenon in citizen science and many other forms of crowdsourcing is that a few 
“power” users provide the majority of the work, while a “long tail” of casual participants may provide only 
a small amount of labor each (Anderson, 2008). That is, many people may be curious enough to try a new 
system (the long tail of many participants, with few contributions each), but only a few will find it 
interesting enough to participate at a high level (the few power users who make many contributions each). 
As there are many players in the long tail, the combined number of classifications provided (even by less 
motivated individuals) can be large. If it takes a long time or much effort for a player to learn the science 
task well enough to provide quality data, however, then the contributions from the long tail, while 
voluminous, may be scientifically worthless. We therefore addressed one final question: 


RQ 3: How is data quality affected by the number of classifications a participant provides? 


4 System Development 


The two purposeful games that we designed to address these questions were centered on a science activity, 
the taxonomic classification of plants, animals and insects. In sciences such as entomology, botany, and 
oceanography, experts and enthusiasts routinely collect photographs of living things. When captured with 
digital cameras or cell phones, photographs can be automatically tagged with time and location data. This 
information can help scientists to address important research questions, e.g., on wildlife populations or how 
urban sprawl impacts local ecosystems. Time and location tagged photos are only valuable, however, when 
the subject of the photograph (the plant, animal, or insect captured) is known and expressed in scientific 
terms, i.e., by scientific species name. This information is rarely recorded when the photograph is captured 
in the field by scientists or amateur enthusiasts. 

To classify specimens, biologists have developed taxonomic keys that guide the identification of 
species. These keys are organized around character-state combinations (i.e., attributes and values). For 
example, a character useful for identifying a moth is its “orbicular spot,” with states including, “absent,” 
“dark,” “light,” etc. Given sufficient characters and states assigned to a single specimen, it is possible to 
classify to family, genus, and even species. However, taxonomic keys are usually written for expert users, 
and are often complex, highly variable, and difficult to translate into a form that will be suitable for use in 
a human computation systems (much less games). 

Working within this area of the life sciences, we developed Citizen Sort, an ecosystem of purposeful 
games designed to let non-scientist members of the public apply taxonomic character and state information 
to large collections of time and location tagged photographs supplied by experts. We also conceptualized 
Citizen Sort to be a vehicle for HCI researchers to explore the intersecting issues of citizen science data 
quality and purposeful game design. 

Citizen Sort features two purposeful games. The first, Happy Match, is a score-based matching 
game that places the science activity in the foreground of the game, and seeks to attract “enthusiast” players 
who may already hold some interest in science, classification, or a particular plant, animal, or insect species. 
It may be considered a form of “gamified task,” in that it is very much like a tool with a non-diegetic, 
points-based game layer added to it. 

Happy Match can be played using photographs of moths, rays, or sharks (Happy Moths, Happy Rays, 
and Happy Sharks respectively). Players earn high scores by correctly classifying the character-states of 
specimens in photographs for which the answers are known, by dragging each photograph to the correct 
state, character by character (see Figure 1). The few known “happy” photos are mixed with other photos 
that are still to be classified, i.e., for which the character-state information needs to be collected. 
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Figure 1: The Happy Match classification interface. 


At the end of the game, players receive feedback about the correctness of each of the character-state choices 
for the known “happy” photos and a score based on their performance (Figure 2). The “happy” photos are 
revealed only at the end of the game, so players must strive to perform well on all photos to ensure a good 
score. 


Scores! 


Í Points Earned Happy Moth #1 (oarnod 30 pts) | Ha o Arno pts) Happy Moths 


FA pi aa There were 2 Happy Moths in this game. 
s5 - You classified 2 of 2 correctly! 

- You Matched 1 of 2 correctly! 

- You collected 1 Happy Moth! +10 p 


Total Points: 80 | 


Figure 2: The Happy Match score interface. 


Happy Match rewards players with points based upon their performance. However, we would argue that 
Happy Match differs from what Bogost (Bogost, 2011) calls “exploitationware” in that it is designed to be 
a meaningful experience for certain players: science enthusiasts who already have an interest in science, 
nature, living things, or classification. While Happy Match’s non-diegetic points have only limited meaning 
for players who do not care about these, they are a meaningful performance metric and reward for those 
who do. 

The second game, Forgotten Island, has players performing the identical classification task as Happy 
Match, one photo at a time, including the use of known photos as a way to check accuracy (Figure 3). 
However, the classification activity in Forgotten Island is situated within an interactive point-and-click 
adventure story set in a vibrant, science fiction game world (Figure 4). Rather than points, players are 
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rewarded with in-game money (a diegetic reward) for each classification. This money is spent to acquire 
further diegetic rewards: equipment and items that can advance the plot and open up new game spaces to 
explore. To motivate effort, incorrect answers for a known photo results in a warning and a slight deduction 
in game money. 

The Forgotten Island game experience — the game world and the story — is designed to be a form 
of continuous diegetic reward as it unfolds, as are (more concretely) the in-game money and equipment 
earned by players. All of these things have only limited meaning outside of the game, but can be important 
to players within the context of Forgotten Island. 

Our intention in developing these two games was to explore some of the relative advantages and 
disadvantages of the two approaches. Scientists who envision purposeful games as an aspect of their 
crowdsourced scientific data collection or analysis activities need to understand how different game 
experiences lead to different player behaviors, as well as (potentially) different data outcomes. 


SHAPE AT REST? 
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Figure 3: The Forgotten Island classification interface. 
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Figure 4: The Forgotten Island game world. 


5 Method 


To explore our research questions regarding motivation and data quality in the classification activity, we 
drew upon data generated by players of Forgotten Island and Happy Match who played using photos of 
moths (since this is the only dataset currently used in both games). For some additional analysis, we also 
drew upon data from other versions of Happy Match that used photos from different datasets (Happy Rays 
and Happy Sharks). 

Participants were recruited naturalistically online, learning about the project and the games from 
news posts, comments, and listings that appeared on various citizen science websites and in science 
publications such as National Geographic and Scientific American. Citizen Sort and its two games are easy 
to find with online searches for citizen science activities and, in some communities, by word of mouth. This 
is to say that the participants for this study came to the project in a manner similar to any other current 
citizen science project. Citizen Sort’s user base is therefore likely to be representative of many other citizen 
science initiatives. 

The number of Citizen Sort users is growing. The data presented in this paper is drawn from 
approximately 900 user accounts, excluding developer accounts and approximately 100 temporary players 
(i.e., players who are 13 years of age or younger whose performance is not tracked between visits; under US 
law, users must be over 13 to create an account). 

Relying on data from naturalistic participation has advantages and disadvantages for our study. 
The main disadvantage is the lack of control: we cannot say if the differences we observe between the two 
games are due to differences in the features of the games or to difference in the participants who choose to 
play the games or (most likely) some combination. However, this confounding of game and players is 
simultaneously a feature of our study: practitioners attempting to deploy such systems would also be 
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constrained by the characteristics of the audiences attracted. Put alternately, we conceptualize the 
comparison we are drawing as between socio-technical systems that comprise both the games themselves 
and the specific kinds of players they attract. 


6 Findings 
We ran a variety of tests on Citizen Sort’s classification and player data. In this section, we present the 
results of this analysis. 


RQ 1: How does player retention differ between a gamified task (Happy Match) and an 
entertainment-oriented purposeful game (Forgotten Island)? 


To address this question, we compared the retention of players for Happy Match and Forgotten Island. The 
retention was measured as how many days a player visited a game and made contributions. The distribution 
of player visiting days was highly skewed: most players only played the game for one day (87% of players 
for Happy Match and 74% for Forgotten Island) and a few “power” players played for many days. Therefore 
we used the non-parametric Wilcoxon rank sum test to compare retention between the two games. We 
found a significant difference between the two games (p = 0.002), with “power” players playing Happy 
Match for significantly more days. 

Figure 5 shows the distribution of the number of scientific contributions (i.e., classifications) in the 
two games. The retention differences between Happy Match and Forgotten Island are also apparent in Table 
1, comparing the percentage of retained players after just one classification decision, after 20 decisions, and 
after 50 decisions. Similar to many online systems, both games see a high initial attrition: when players try 
the game for the first time, most quickly lose interest and do not return. Attrition for Happy Match appears 
to continue at a steady rate, with only a small core set of “power” players continuing to contribute regularly. 
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Figure 5: Distribution of number of decisions contributed by Happy Match and Forgotten Island players. 


Retained at Retained at Retained at 

1 Decision 20 Decisions 50 Decisions 
Forgotten Island 45% 32% 16% 
Happy Moths 92% 79% 33% 
Happy Rays 93% 76% 38% 
Happy Sharks 89% 63% 21% 


Table 1: Percent of players retained by number of decision made. 
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In contrast, for Forgotten Island, the rate of attrition seems to fall off after a larger initial loss. Note that 
players do not make their first classification decision until some way into Forgotten Island. Some players 
may decide that they are not interested in Forgotten Island’s story and game world before making any 
classifications, which may explain the more rapid drop-off in retained players at the point of making the 
first classification decision. However, it seems that if a player does find their interest captured by Forgotten 
Island they are more likely to continue playing until the end of the game. Note also that unlike Happy 
Match, Forgotten Island can be “won” and its story eventually concludes. While players can continue to 
play parts of the game, for the most part, Forgotten Island is finished at this point. It takes about 320 
classification decisions to win Forgotten Island. 


RQ 2: How does the quality of data produced by players differ between a gamified task and an 
entertainment-oriented purposeful game? 


We expected that Happy Match players would show better data quality than Forgotten Island players 
because Happy Match was designed to be classification task-focused and Forgotten Island was entertainment 
and adventure-focused, with the science task as a side element of the game. To test the difference between 
the two games, we compared classification accuracy for players of Forgotten Island to Happy Moths, both 
of which use the moth photo dataset. 

We computed accuracy by comparing players’ answers for pictures to the known correct answer. 
To increase the pool of classifications for the comparison, we ran the game using only pictures for which we 
already knew the species of moth represented. However, there is not a one-to-one mapping from species to 
state (e.g., individuals of a particular species can be different colors). We counted as correct any of the 
possible answers, which inflated the computed accuracy. 

We restricted the sample to people who had done a minimum of 20 classification decisions on moths 
(equivalent to 5 photos, since classifying each photo requires 4 decisions). We compared the accuracies of 
players of the two games using a two-sample t-test. To our surprise, our results showed no significant 
difference in the accuracy of the data provided by Happy Match and Forgotten Island players. 


N (sample size) Classification Accuracy 
Happy Match (Moths) 289 players 0.806 
Forgotten Island 81 players 0.802 


p-value=0.746 


Table 2: Comparing classification accuracy. 


Despite the overall similarity in accuracy, we did find some evidence that player classification behaviors 
and the accuracy of data produced by players interact and vary between the two games. Specifically, in 
Forgotten Island we observed a number of instances of “cheating” behavior, identified by checking the mean 
time spent by a player on a single classification and the overall accuracy of those classifications. Cheaters 
had a distinct signature: very rapid decision making with low accuracy (at the level of chance). Neither low 
accuracy nor rapid decision making were, by themselves, indicators of cheating. “Power” players who were 
deeply invested in either Forgotten Island or Happy Match often became proficient enough to rapidly make 
accurate classification decisions, and some players simply struggled with classification. However, fast 
classifications coupled with poor accuracy seemed to indicate the profile of a player more interested in game 
play than in classification. 

Figure 6 plots Forgotten Island response time against performance for individual players. Red circles 
represent data for the first 20 photos and green circles represent data for all photos for a player. The two 
green circles in the lower left of the chart represent players whose performance decreased to the level of 


chance as their response time per question also decreased, which we interpret as evidence of cheating. 
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Figure 6: Performance vs. response time in Forgotten Island. 


RQ 3: How is data quality affected by the number of classifications a player provides? 


As expected, the number of classifications made by players of Happy Match and Forgotten Island exhibits 
a highly skewed “long tail.” For example, in Happy Moths, just 4.4% of players contributed 50% of the 
decisions. 63% of players played only one game (including those who did not finish), 37% played at least 
two games, 19% played at least three games, 12% played at least four games, and only 8% played at least 
five games. 

We expected that there would be a positive correlation between the number of classifications that 
players contribute and their performance, that is, that players would learn the game and the characters and 
states and so improve their performance. We used Spearman rank correlation to measure the relationship, 
and we restricted the sample, as before, to people who had contributed a minimum of 20 decisions. The 
results are shown in Table 3 and graphed in Figure 7. To our surprise, we did not find a significant 
correlation for any of the four games (Happy Moths, Happy Rays, Happy Sharks, or Forgotten Island), 
meaning that those who contributed longer were not more accurate. 


N (sample size) Rho p-value 
Forgotten Island 81 —0.043 0.700 
Happy Moths 289 —0.098 0.096 
Happy Rays 208 0.070 0.317 
Happy Sharks 107 —0.056 0.569 


Table 3: Correlation between number of classifications and accuracy. 


223 


iConference 2014 Nathan Prestopnik et al. 


o —) o 
P D œ 
9 
© 
io 


Average accuracy of the player 
x s s 


Happy Match (Moths) 


> 
o 


20 50 100 200 500 1000 2000 5000 
Number of classification decisions contributed by a player 


Figure 7: Player performance vs. contributions. 


7 Discussion 


The most interesting findings from the comparison above were the overall similarity between the two games, 
with the exception of cheating, and the lack of a learning effect. 


7.1 Cheating Behavior 


Cheating behavior was apparent only in Forgotten Island, underscoring how non-diegetic and diegetic 
reward systems can have different impacts on player behaviors and data quality. There is little reason for 
Happy Match players to cheat: for power players, achieving a score-based reward without also achieving 
some meaningful experience would be pointless, while for long-tail players, neither the points nor the game 
experience are worth the effort of cheating. Power players will attempt to do well because they are personally 
interested in doing so, while long-tail players who are uninterested in the science activity simply stop playing 
Happy Match. 

Forgotten Island, on the other hand, has built-in incentives that make cheating more likely and 
potentially beneficial to certain players. The diegetic reward system connects classification activity to in- 
game rewards like game money, new areas to explore, new puzzles to solve, and new story elements to 
engage with. Players who are interested in the science activity may still not want to do well on it. However, 
for players who enjoy the game but not the science task, cheating will make the overall game faster and 
easier, allowing players to focus on the diegetic rewards — game money, the game world, and the story — 
rather than the work required to progressively experience them. As a result, cheating may be an attractive 
proposition for players who realize that they can still make enough money to play through the game even 
when doing poorly in the classification task. 

Forgotten Island is currently configured so that cheating is a feasible strategy, a decision driven by 
the overall game experience and not just the need for accurate classifications. It would be possible to adjust 
the classifier in Forgotten Island to discourage cheating more strongly. Players who answer incorrectly on 
known photos could be punished to the point that making money would be impossible without carefully 
attending to the classification task. However, feedback collected during play tests and other evaluation 
exercises for both Happy Match and Forgotten Island (Prestopnik & Crowston, 2012) suggest that species 
classification is inherently difficult to do well, and that many honest players struggle to do a good job. 
Configuring Forgotten Island to make cheating impossible could easily render the game too difficult to play, 
as non-cheating players would be regularly punished for well-intentioned but incorrect answers. Accordingly, 


224 


iConference 2014 Nathan Prestopnik et al. 


the game was configured to be easy so that players can continue to make progress even though this design 
choice means that cheating is viable for players who realize it. 

Overall though, there was no significant difference in performance between Happy Match and 
Forgotten Island, comparing players who made a minimum of 20 classification decisions (i.e., the level of 
cheating was not high enough to significantly affect the overall results). This finding suggests that both 
diegetic and non-diegetic reward systems can be viable for citizen science human computation tasks. 
However, precautions should be taken to identify and exclude data from cheaters or outliers who may be 
more interested in the game’s entertainment experience than its science, e.g., by including a few known 
items to detect poorly performing players and omitting their data from analysis. 


7.2 Lack of Learning Effects and the Value of the Long Tail 


We found no evidence for learning effects in either Happy Match or Forgotten Island when looking at players 
who had classified at least 5 photographs. This is an interesting, unexpected and useful finding. Many 
citizen science initiatives heavily rely on power players to provide the majority of data. In our exploration 
of player behaviors we noticed this division of labor as well, with 4.4% of players contributing 50% of the 
classification decisions in the Citizen Sort system. These “power players” provide the bulk of scientific data 
and so are critical to the success of the project. In other settings though, value can come also from the lower 
volume mass. For example, Anderson (2008) espoused the value of the “long tail” in the context of online 
marketplaces. Though most items in a market may sell only a few units each, the cumulative sales of the 
tail can be comparable to the fewer best-selling items that seem at first to be more lucrative. Similarly, 50% 
of classification decisions in Citizen Sort came from what we dub long-tail players. However, verifying that 
long-tail classifications are as accurate as power-player classifications is important, because if they were not, 
their 50% of the data would be useless. The lack of a learning effect coupled with the acceptable accuracy 
found in Happy Match and Forgotten Island suggests that long-tail classifications are not a waste. The 
overall accuracy of classifications generated by players is relatively consistent over time and at high enough 
level that new players, even those who leave shortly after trying a game, can provide data that is usable 
and comparable in quality, if not quantity, to long-term power players. 

The usefulness of long-tail classifications raises another interesting issue regarding the design of 
purposeful games and gamified tools: the distinction between games that are genuinely engrossing to play 
and games that merely look engrossing to play. Game designers aspire to the former, hoping to produce 
great experiences for players that will keep them entertained for hours, days, months, and even years. In 
the context of purposeful activities, however, there can still be value in producing games that fail to achieve 
this standard but still attract a critical mass of short term, “long tail” players. These games may be 
intentionally designed as short-term experiences, or may simply be games that fail to live up to their 
promise. Either way, if they look interesting and are tried by enough players, they may very well produce 
data that is useful. 

Are such games a form of Bogost’s (2011) so-called “exploitationware?” If the intention is to attract 
players with false promises about the game experience, the answer must be “yes.” However, if the intention 
is simply to create a good, short-term experience for players, the answer may be “no.” Furthermore, while 
game designers never aspire to create bad games, for a variety of reasons, bad and mediocre games are far 
more common than great ones (Schell, 2008). Given the resources required to create an entertainment- 
oriented purposeful game, it is reassuring to know that even modestly engaging games can still produce 
meaningful data if they are tried by enough short-term players. Though not ideal, this effect mitigates at 
least some of the risks involved in producing purposeful games. It may also give scientists leeway to 
contemplate the design of game experiences that aspire to more than task-focused gamification. 
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8 Future Directions 


While most citizen science games favor non-diegetic rewards and task-centric game play, Forgotten Island 
shows how diegetic rewards and a game world that is not tightly bound to the science activity can still 
produce data of value to scientists. This “game taskification” approach (Prestopnik & Crowston, 2012) 
raises interesting possibilities, among them the potential to create scientific research tools that are also 
commercial entertainment products. Two possibilities seem especially interesting: 1) develop and release 
games like Forgotten Island for profit, supporting scientific research and game development with sales of 
the game, or 2) partner with existing game studios to integrate science tasks into commercial titles. Each 
approach has advantages and disadvantages. 

For purpose-built citizen science games, the primary advantage is that the game can be exactly 
tailored to the science task, while the primary disadvantages are the time and resources required to plan, 
design, implement, release, and support the game as well as the difficulty of marketing and attracting 
players. 

For entertainment games that have science activities grafted onto them, the advantages and 
disadvantages are roughly reversed. Science activities may suffer in service to the entertainment game 
experience, even if development resources become less of an issue. Yet a for-profit game title that included 
a real world science activity, perhaps as a diegetically motivated mini-game, could have a potential 
marketing advantage over its competitors. 

It is easy to envision how “grinding” tasks found in many current game titles could be turned into 
real-world, purposeful activities. In many cases, this could be done without compromising the integrity of 
either the game experience or the science; for example, a space adventure game could easily integrate real- 
world astronomy activities, just as a plant biology activity might become part of an alchemy exercise in a 
medieval fantasy. As Happy Match and Forgotten Island demonstrate, data quality need not suffer unduly 
in entertainment-oriented games, as long as player activities are adequately measured so that bad data and 
unwanted player behaviors do not adversely impact the final data set. 


9 Limitations and Conclusion 


In this study we explored a variety of differences between two purposeful video games for citizen science. 
Specifically, we studied how the diegetic and non-diegetic reward systems of purposeful games and 
“gamified” tools shape play experiences, impact player activities, and, most significantly, affect data quality. 

We found that different reward systems and gamification approaches can certainly impact player 
recruitment and retention, as well as the ways that players experience purposeful games, but that these 
modalities need not adversely impact data quality. We also found that while most data in purposeful games 
for citizen science will be contributed by a few power players, the many players who make just a few 
contributions still provide quality data. The quality of contributions made by these long tail players does 
not appear to be adversely impacted by the specific reward structures or gamification approach that is used. 

A limitation of the current study is the approach taken to computing accuracy based on the species 
classification. To address this limitation, we are exploring more precise ways to compute accuracy. For 
example, with enough players, we could measure individual agreement with the consensus rating for a 
picture. 

In future, we hope to explore how game design, commercial game design in particular, and 
purposeful game design might intersect to reach greater numbers of players in service to the creation of 
meaningful play experiences, the economics of the game industry, and the data requirements of scientists. 
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Abstract 

This paper explores the specific needs activists have of the technologies they use to manage their 
operations and promote their causes. To begin this exploration, we conducted two critical making 
workshops with participants who self-identified as activists and used craft materials — such as cardboard, 
color markers, pipe cleaners, etc.—to create speculative technologies to find commonalities across 
different forms of activist work, be it technological, organizational, or procedural. The needs and concerns 
expressed in the workshops were articulated through the participants’ designs; they materialized their 
critiques, reflections, and explorations through their crafted prototypes. These prototypes point to 
opportunities for creating new design interventions to address the challenges and needs unique to activist 
organizations. The work suggests the need for more value-sensitivity and context-appropriateness in the 


design of interactive systems. 
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1 Introduction 


What would it mean to design technologies for activists or those who campaign for radical social change? 
How might the objectives of design in this context differ from already existing consumer technology, such 
as cars, personal computers, and mobile phones? What functionalities would be most important to activists 
and what types of new interactions would activists imagine if given the opportunity? This paper explores 
the specific needs that activists have of the technologies they use to manage their operations and promote 
their causes. 

Activist organizations are defined here as formal groups that enact direct, confrontational action- 
such as protests or strikes- in order to advance their particular political or social agendas. These groups are 
often political in nature and take aim at changing (rather than collaborating with) institutions. They work 
with little, if any, institutional support and thus operate with minimal support from foundations, public 
grants, or fundraising cycles (Goecks, et al., 2008). As a class of organization, activist groups share many 
features with non-profit social-service groups: both utilize a mostly volunteer workforce with high turnover 
(Harrison, et al.; Le Dantec and Edwards, 2008; McPhail, et al., 1998); they often employ people motivated 
by social justice issues or principled political positions but who lack specific technical expertise (Merkel, et 
al., 2007; Merkel, et al., 2004); and they must work with donated, aging, and obsolete technology (Voida, 
et al., 2011; Le Dantec and Edwards, 2008; Le Dantec and Edwards, 2010). Beyond these similarities, 
however, activist groups must operate under additional constraints in order to respond quickly to situations 
that arise within their communities of concern: for activist organizations, work is often urgent, 
unpredictable, and spontaneous as they stage public interventions and seek to raise awareness of the issues 
for which they are fighting. These conditions — having a controversial position in society, enacting ad hoc 
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action, and relying on limited resources — offer unique challenges for which it is difficult to prepare (Hirsch, 
2009). The impact of these conditions on the design and use of technology for and in activism has been 
understudied and we argue that design interventions can help develop infrastructures that address the 
particular kinds of unforeseeable issues that arise specifically within the context of activist organizations. 
To begin an exploration of design for activism-specific technology, we conducted two critical making 
workshops with participants who self-identified as activists (Ratto, 2011; Cohn, et al., 2010; Sanders and 
Stappers, 2008; Hirsch, 2009). One workshop was done with a local housing justice organization, Occupy 
Our Homes Atlanta (OOHA), during which members imagined their use of technology during a typical 
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“action,” referring to radical practices specific to the organization, such as courthouse protests or home 
liberations (described later in this paper). The second workshop was hosted at the Allied Media Conference 
and included participants from various organizations around the U.S. who discussed less radical methods of 
activist work, such as skillshare workshops or education outreach campaigns. In both workshops, 
participants used craft materials—such as cardboard, color markers, pipe cleaners, etc.—to create 
speculative technologies, which is a way to imagine alternative, provocative solutions to present problems. 
The workshops were designed around a number of prompts to encourage conversation in hopes of finding 
commonalities across forms of activist work, be it technological, organizational, or procedural. The needs 
and concerns expressed in the workshops were articulated through participants’ designs; they materialized 
their critiques, reflections, and explorations of mobile and social technologies through their crafted 
prototypes. The collected reflections from the two workshops lead three key areas that should be considered 


in the design of future technology aimed at supporting activist organizations. 


2 Method 


We explicitly draw on Ratto’s critical making model in which design workshops are structured around 
participants lived experiences and where material production based on those experiences is recognized as a 
form of knowledge production (Cohn, et al., 2010). There are three stages to critical making that scaffold 
participant’s framing, creation, and reflection on the issue (Ratto, 2011): the first step involves the 
compilation of concepts and ideas that can be explored through making, or the act of material production. 
The second invites workshop participants to design and create prototypes that explore those concepts, the 
third and final stage is an iterative exploration of the alternatives embodied in the speculative prototypes. 
In the case of the speculative activist workshops we developed, the experiences of the participants informed 
critical perspectives of current power dynamics and structures in technology production, reflection on how 
those structures specifically impacted activist activities, and how technologies can be incorporated into 
existing radical practices to address common activist concerns and issues. 


3 Workshop Structure 


The goal of the speculative activist workshops was to engage participants from our two sites in conversations 
to interrogate their current uses of various technologies and to better articulate how they would like to use 
technology in future activist work. We accomplished this by structuring the workshop around three distinct 
activities, each corresponding to the three stages of critical making: we began with a discussion activity 
that prompted participants to reflect on the different ways technology was used during protest activities; 
we then structured a crafting design activity that built on themes from the discussion; finally, we reflected 
throughout the workshop on the alternatives—both those that already exist and those that might be 
speculated through design. 

During the discussion activity, participants spoke about activities they considered part of their 
activist work. These conversations broadly focused on what technologies were used and in what way (e.g. 
using social media to broadcast messages to different audiences). In the OOHA workshop, participants were 
split into groups of three and asked to describe how they used information technology at different points 
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during a protest action (before, during, and after). In order to scaffold the discussion of how they worked 
with different technologies, OOHA workshop participants were given a set of cards with either an action or 
an object: the object cards included things like family, Instagram, organizers, poster, and GPS; the action 
cards included actions like connect, edit, share, call, create, and open. Participants were also given blank 
cards and were told that they could write in any action or object they felt was missing. The card prompts 
were not included in the discussion portion of the AMC workshop because there was a greater degree of 
digital literacy in the AMC group so less scaffolding was needed: participants in the AMC workshop were 
media practitioners, who were more fluent in technological jargon and had a more nuanced understanding 
of the subtleties, affordances, and differences of various digital platforms. The AMC conversation was still 
anchored by actions and objects to help better articulate the interactions between activists and technology. 
By discussing the results as a group, workshop participants became more familiar with their use of 
technology within the context of their operations to be better able to redesign them. 

The first discussion activity laid the foundation for the second activity where we asked participants 
to create prototypes that explored the issues, concerns, and concepts that arose from the previous discussion. 
A variety of crafting supplies were available for participants to prototype with, such as pipe cleaners, craft 
paper, scissors, sticky notes, crayons, cardboard, hot glue guns, markers, and (as requested by one 
participant) googly eyes. The prototypes were not meant to be technically accurate, functional, or even 
necessarily plausible; they were intended to empower participants to further explore their conceptualization 
of how technologies might be designed to aid their activism. Throughout the design activity we worked to 
avoid leading participants with specific ideas or values, instead using the materials and discussion from the 
first session to help guide and direct participants through the process. Throughout these two activities, we 
encouraged participants to be imaginative and creative with their prototypes, aiming for pie-in-the-sky ideas 
that addressed particular issues or topics that came up. At the end of the workshop, when participants 
explained their prototype and how it worked, we were able to explore alternate futures where in which a 
key concern or issue was resolved or shown in a new light. The workshops led us to formulate key research 
questions: How might low-fidelity prototyping materials and activities instigate critical and speculative 
ideas about technology design? What are the commonalities among envisioned alternate designs with regards 
to activist values and how might those be addressed by context-sensitive technology design? 


4 Related Work 


Critical design workshops operate in an existing ecology of community-focused work and technology- 
supported action research (Björgvinsson, et al., 2010; Merkel, et al., 2007; Merkel, et al. 2004). Within this 
body of literature, several studies have shown the complex relationship non-profit and community-based 
organizations have with ICTs (Le Dantec and Edwards, 2008; Le Dantec and Edwards, 2010). For example, 
Voida et al.’s study of service organizations specifically focuses on the use of “homebrew databases”, which 
are the ad hoc mixture of information organization technologies (e.g. spreadsheets) due to limited technical 
capacity and high staff turnover (Voida, et al., 2011). These organizational structures rely on volunteer 
efforts, not unlike activist organizations, though to a lesser extent. Voida et al.'s work looks at how this 
‘make do' structure functions through the lens of information management, which is a necessary task for 
an efficient and functional organization, but plays a larger role in activist circles where the immediacy and 
urgent nature of the work makes knowledge management that much more crucial to the strength, 
cohesiveness, and often survival of the organization (Hirsch, 2009; Kuznetsov, et al., 2011). Information 
management (e.g. storing records, accessing data, or managing contacts), is complicated due to the 
specialized and specific knowledge often required across multiple levels of a non-profit organization (NPO); 
this is exacerbated in activist groups because volunteers also come into the organization with varied training 
backgrounds. Because of these knowledge discrepancies, information transfer is all the more crucial in 
establishing a coherent and communicative organization. Carlisle's framework emphasizes the transferring 
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of knowledge across boundaries, many of which often extend beyond an activist organization’s concerns 
(e.g. political, legal, economic) (Carlisle, 2004; Le Dantec and Edwards, 2010). The question is not just 
limited to how to effectively transfer specialized information within a group, but how to proceduralize this 
transfer to adapt to the high volunteer turnover rate. The shift in focus from a single organization to 
multiple organizations is a crucial one that Le Dantec writes about, particular in public sectors. Both 
activist and public sector work often requires collaboration among multiple entities; the work of Le Dantec 
and Edwards elucidates the subtleties of these relationships through the interplay of power dynamics, 
influence, and scale (Le Dantec and Edwards, 2010). 

The concept of community is not an unproblematic one as there are multiple different dynamics 
and relationships that are encapsulated by ‘community’ where multiple and varied publics, interests, and 
social and cultural practices must coexist (DiSalvo, et al., 2010; Ribes and Finholt, 2008). It is not just the 
organizational structure that is in flux, but the opinions, concerns, and values of its members, as well. 
Community-based research is political in its connections to local entities (e.g. government, community 
stakeholders), but it is also internally political as it cannot—and should not—be assumed that the members 
of a single organization are homogenous and unvaried (Le Dantec, et al., 2011; Le Dantec and Edwards, 
2008). 

In this context, the workshops described in this paper are ways of addressing these challenges as 
potential opportunities for expression and discourse (DiSalvo, et al., 2008). Activist work is contestational, 
both within an organization and outside it. In looking at and discussing how technology is used in activist 
practices, these design interventions provide a space for participants to express their concerns and desires 
and to create arguments that are not present in existing technological paradigms. The ways that workshop 
participants talked about technology became a way to make visible values and practices shared among 
different activist organizations and stakeholders, such as designers, academics, and technologists more 
broadly. 


5 Participants 


The workshops were held with two distinct groups of activists. The first group was a local activist 
organization that focuses on housing justice in the city of [redacted]; the second was a mixed group of 
activists attending the Allied Media Conference (AMC is an annual conference in the U.S. for media 
practitioners and social activists). These two groups offered different views and on-the-ground perspectives 
for how technology figured into their work and how they might further use new forms of digital and social 
technologies to promote their work and their causes. 

Two of the authors have been involved with OOHA, our first site, since August 2012. Our 
involvement with the organization has been built around extensive ethnographic fieldwork based on 
participant observation (Dombrowski, et al., 2012): we have been included in retreats, canvassing, weekly 
meetings, and have worked in different administrative capacities (e.g., note taking, data entry); we also 
attended major actions, such as marches, court auctions, and press events and have become more involved 
in an educational campaign by helping build a data visualization tool, and creating and documenting 
internal procedures. OOHA emphasizes non-violent direct action strategies to confront and engage with 
larger institutions that play roles in housing issues, like banks and local government. Direct action is 
typically targeted at a resident’s individual housing struggle, known as a “campaign,” which can take the 
form of demand letter deliveries to relevant stakeholders (e.g. demanding a loan modification from bank 
officials), protests at local home auctions (to dissuade potential investors from purchasing foreclosed homes), 
or home liberations (where a resident ignores the official eviction notice and remains in their home). 

Our second site, the AMC workshop, had participants from diverse backgrounds who were asked 
to briefly speak about their experiences as activists. A trio of participants were in the same organization 
and all did work focused on young Muslim women and photography, using photography lessons as a way 
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to encourage the women to articulate their experiences of being Muslim in a contemporary western city. 
Another participant also did work with young adults: hailing from Brooklyn, she was part of an organization 
that ran after-school art programs. These participants shared similar uses of digital platforms, such as 
Dropbox for collecting visual resources (photographs, etc.) and social media (Facebook, Twitter) for 
broadcasting upcoming events. They were also fairly familiar with the platforms that their respective 
student groups used, understanding how different affordances correspond to particular kinds of uses (e.g. 
Instagram is popular among students, but not useful for communicating textual information). 


6 Prototype Themes 


Our inductive qualitative analysis revealed several thematic patterns that emerged from the artifacts and 
participant discussion produced from both workshops. We will focus on the themes of subverting authority, 
contingent communication, and sustainability because they demonstrate the unique challenges faced by 
activist organizations. The themes were directly informed by events that participants had experienced in 
the past and reflected concerns that were specific to an organization's activities. What the inventions made 
salient was that activist work is not just considered as a political act, but also situated, physical, and 
corporeal: the materiality of activism is something that was deeply considered. 


6.1  Subverting Authority 


Given that many OOHA actions are confrontational, it was unsurprising that subverting authority was a 
theme that emerged from that group’s prototypes. However, we found that this theme was also present in 
prototypes from the AMC workshop; subversive strategies are not necessarily limited to direct action and 
can be demonstrated through other means. Participants expressed a variety of disruption tactics, which 
we've categorized into avoidance, resistance, and aggression. Avoidance tactics are more passive and are 
directed towards minimizing interactions with authorities. This could manifest itself as a reaction, like 


escaping encroaching authorities, or as a preventative measure. 


Figure 1: The participant points out the aquaponic system contained in each Aquaponic Indoor Restroom, 
which would be the only thing visible to police in order to disguise the resting activist inside. 


6.1.1 Avoidance 


One OOHA prototype called the Aquaponic Indoor Restroom demonstrated preventative avoidance. The 
participant explained that the Restroom was designed as a comfort space for activists to rest and nourish 


233 


iConference 2014 Mariam Asad et al. 


themselves, containing a bed, toilet, and sustainable garden powered by rooftop solar panels. However, only 
the garden would be visible from the outside, disguising the resting activist inside. Avoidance was built into 
the design by obscuring the activist’s location, which in turn allowed them to evade the authorities. 

Avoidance tactics were also present at the AMC workshop: the Turtle Con-tent was a tent where 
activists could gather during an action to regroup and discuss strategy, which used “magical future 
technology” to teleport and physically evade authorities that had discovered its location. The concept of 
avoidance was not necessarily limited to physical confrontations, however; it was also a way to prevent 
access to privileged information. This was best demonstrated through the Analog Torrent prototype, which 
imitated the torrenting digital distribution model by using the decentralization of information as an 
avoidance tactic. The Analog Torrent had color coded ‘receiving stations’ and pompoms. A pompom 
represented a single message and its color corresponded with the color of its receiving station. The pompoms 
were broken up to travel along different paths of a ‘web’ to be re-formed into the entire original message at 
its destination receiving station. If a single portion of the message was intercepted by authorities somewhere 
along the web, then the entirety of its contents would not be revealed. 


6.1.2 Resistance 


In contrast to avoidance methods, resistance methods could be seen as more of a reaction against authorities, 
though this does not necessarily entail direct action. The Geographic Hashtag prototype from the AMC 
workshop was an example of this: symbols were either looped or tied to physical structures. The symbols 
corresponded to different messages, which were only known to Geographic Hashtag users, thus marking 
physical locations with a particular meaning. The participant explained a use case for a specific kind of 
Hashtag where the bracelet contained an eye symbol on it, indicating that there was bullying in the area. 
The ‘authority’ in this context is not an institutional one, but other entities also acting as forces threatening 
to evade. The bully Hashtags were a direct reaction to an ongoing hostile situation and, as explained by 
the participant, warned others to keep a watchful eye out in case they needed to intervene, thus operating 
as a community resistance united against an acknowledged aggressor. This logic was also found in another 
context through an OOHA example called Furniture Freeze. 


Figure 2: The participant adjusts the blanket on a resident occupying a home during a home liberation. 
When active, the prototype would freeze furniture in place, preventing forcible eviction. 


The Furniture Freeze prototype was a small diorama of a living room with a button on the wall. When 
pressed, the button would freeze furniture and objects in place to disrupt the eviction process. The designer 
pointed out that this technology is specifically for a home liberation action such that it would allow activists 
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to stall for time in order to build a blockade and gather support from other members and allies. During a 
home liberation, there is the constant threat that authorities will forcibly remove residents from their homes, 
as well as their belongings, which Furniture Freeze directly opposes. 


6.1.3 Aggression 


Due to the confrontational nature of direct action, there is also the possibility for aggressive reactions to 
authority. These are not necessarily intentional, but can arise as a result of the circumstances of a particular 
situation. There was only one prototype that intentionally worked aggression into its design and it emerged 
from the OOHA workshop. Bionic Dogs had two cardboard silhouettes of robot dogs that were trained to 
attack different authorities. The participant focused on police officers and bank officials. The latter target 
is an interesting one insofar as bank officials are rarely, if ever, in the role of the direct aggressor. Actions 
that involve bankers are initiated by OOHA members, such as marches or rallies, and these actions are 
consistently framed as peaceful and non-violent. Bionic Dogs is an example of radical activist politics that 
are specific to an individual and diverge from that of the organization. It is in this vein that we can read 
these prototypes as metaphors for the varied perspectives and strategies within activist organizations. Even 
though members can be united behind a similar cause, they often have varying goals and even more 
disparate methods for meeting those ends. 


Figure 3: Each Bionic Dog breed is trained to attack a different authority. 


6.2 Contingent Communication 


Due to the unpredictable and spontaneous nature of activist work, it is difficult to fully prepare, or even 
properly anticipate, the challenges that may arise. Often, a crisis will emerge that requires immediate action 
from a group’s members. In this context, communication is contingent on the status of the action: a crisis 
will require more urgent means of communication than a member meeting. With these prototypes, the 
emphasis was on access control, ensuring private communication to members and allies were not intercepted, 


as well as urgency, to manage communication priority and calls for participation. 


6.2.1 Access Control 


One primary communication concern between both workshops was access: how do activists restrict access 
such that information remains private while still delivering a message to those privy to it? In the earlier 
Analog Torrent example, this was done through a physical web that distributed and decentralized a message 
to protect its contents from potential interceptors. This was also conveyed through color coordination: the 
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participant explained that each color represented a different form of social media. The prototype focused 
on inclusive access by ensuring that the same message could be delivered to individuals regardless of which 
form of communication they used. Blue pompoms, for example, represented Facebook communication, while 
orange pompoms delivered tweets. An OOHA prototype approached this issue through a focus exclusive 
access: the Robot Bird is an airborne device with a built-in projector to deliver messages: “it’s a good way 
to get the word out, what’s going on, and look at this really messed up thing the cops are doing right now.” 
The participant explained that the Robot Bird had chameleon properties, which helped restrict access to 
the message that was only intended for a specific audience. 


Figure 4: The prototype has both a brick pattern and a leafy green pattern to demonstrate its chameleon- 
like properties. 


6.2.2 Urgency 

In addition to the content of a message, both workshops drew out and emphasized the issue of urgency: 
information needs to be communicated quickly and directly to a large audience who is distributed 
throughout a geographic space. Two designs—the Constant Card and the Bam! Button—addressed this 
issue in similar ways. The former was created at the AMC workshop and was a piece of construction paper 
slightly larger than a business card. The participant explained that in her work with students, there are 
often different disconnects in the mode of communication between students and faculty, parents and 
students, and parents and faculty. Communication takes place through various means: Twitter, Facebook, 
SMS, etc. The Constant Contact card displays messages specific to that particular organization so that 
students, faculty, and parents all have a consistent channel of communication. Urgency is demonstrated 
through access: card holders will receive an urgent message immediately, rather than experiencing delay by 
having to check multiple social networks or other channels. The OOHA prototype, the Bam! Button, sends 
messages through multiple channels simultaneously to reach its audience as quickly as possible: “just one 
button—bam!—and it set the sequence off and we didn’t have to worry about a phone tree malfunctioning 
or whatever.” During group discussion at the OOHA workshop, participants acknowledged that internal 
communication is an effective way to maintain and strengthen membership ties, but too much 
communication could risk information overload or volunteer burnout. The Bam! Button offers multiple 
priority levels so that prototypes could communicate both time-sensitive and low-impact messages. This 
ensures that organizational members are not consistently bombarded with messages and can manage their 
participation without feeling like they are on call for the organization at all times. 
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Figure 5: The Bam! Button user can select a priority level before sending a message to an organization’s 
members. 


6.3 Sustainability 


We drew out different interpretations of what it means to maintain the momentum of a movement. We 
interpreted three different categories of sustainability: environmental, organizational, and personal. Issues 
around environmental sustainability follow an expected arch where participants were sensitive to the 
resources used and the waste produced through their work. The notion of stamina is a useful way to frame 
concerns in organizational and personal sustainability where maintaining momentum and emotional 
connection to the issues and to the individuals became a clear challenge. 


6.3.1 Environmental Sustainability 


Environmental sustainability was present in many prototypes, which included renewable resources or self- 
sustaining ecosystems as a way of minimizing resource consumption. The OOHA Aquaponic Indoor 
Restroom was not only powered by solar panels, but also had a closed loop system where water and nutrients 
from the toilet were used to sustain the garden. Prototypes from the AMC workshop were similarly built 
around environmental sustainability: both the Turtle Con-tent and the Constant Contact card powered 
themselves through renewable sources, using wind turbines and solar panels, respectively. 


6.3.2 Organizational Sustainability 


Organizational sustainability is more about maintaining the momentum of the entire group after an action 
or, alternatively, about regaining the momentum if a particular action or campaign does not reach an ideal 
conclusion. The main way that this takes place is through communication: if members are not kept up-to- 
date with campaign updates, then this could lead to member disengagement. The Bam! Button and the 
Robot Bird both address this, as discussed above. A third OOHA prototype, Insect Media, is a robotic 
insect that carries messages around the city to be delivered to members at home. The different colored 
pompoms represent different priority levels and different kinds of required action. This prototype highlights 
the diversity of organizational messages as well as the various roles that are required within a single 
organization. While some Insect Media messages are calls to action, some are member updates or deliver 
tasks that need to be completed. Not every member needs to be a home liberator; an organization is 
maintained through multiple roles performing different tasks simultaneously. 
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Figure 6: The different colored pompoms represent different types of messages or different calls to action to 


accommodate the different roles people play within the same organization. 


An AMC prototype also echoed the importance of internal communication among members. Three 
participants grouped together to create Ignorance Glasses. It was inspired by Muslim djinn, which are only 
visible to people based on particular behavioral traits. For example, especially greedy people will be able to 
see greedy djinn, or evil spirits. This conditionality was built into the design of Ignorance Glasses: the 
wearer is able to see the biases and prejudices of the person they are looking at. By wearing the Glasses, 
members of an organization are able to better understand and empathize with each other, thus creating 
deeper bonds to strengthen the group as a whole. Organizational sustainability, through this prototype, is 
maintained through the interpersonal relationships within the group, building trust and respect in order to 
support a more unified and united membership. 


Figure 7: Ignorance Glasses develop empathy and help others see the prejudices and biases that might 
impede organizational trust. 
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6.3.3 Personal Sustainability 

Issues of the emotional or mental are categorized as questions of personal sustainability. During an action, 
for example, self-sustaining systems can minimize resource consumption, but does not necessarily address 
corporeal durability: how does an individual remain physically and mentally sound during an action that 
could last a number of hours, if not days? This spans a number of issues including nourishment, personal 
comfort, and mental stability. The inventions highlighted the importance of individual activists being 
healthy and physically capable of lasting an action. This concern arises due to the unpredictable and often 
volatile nature of actions; certain kinds of infrastructure that are taken for granted in non-activist contexts, 
such as electricity or running water, are rarely available in protest spaces. Beyond this, it is even more 
difficult to try and plan for an action to exist in a space where these facilities are accessible and available. 
The Aquaponic Indoor Restroom offers a bed for recovery and comfort, as well as a toilet, which addresses 
the very personal (and relevant) issue of human waste. An OOHA prototype called the Portable Sustainable 
Toilet offered a collapsible personal toilet made of recyclable materials. The cardboard model included a 
toilet bowl, seat, and lid, as well as a built-in toilet paper roll holder. A biodegradable bag was attached, 
as seen in Figure X, which could then be disposed of nearby. The participant highlighted privacy and 
convenience so that an individual activist could be comfortable and healthy without a dependence on pre- 


existing infrastructure. 


Figure 8: The Portable Sustainable Toilet is made entirely of recyclable materials (the participant asked us 
to pretend the plastic bag was biodegradable). 


Personal stamina, while directly relevant, is not limited to direct actions or outdoor events. An AMC 
prototype, the Activist Apron, featured a number of different pockets with different organizational flyers 
and materials contained within them. While the participant did not physically build her prototype, she 
described the main feature of the Apron is the ability to hand it off to another activist, thus sharing the 
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labor. When occupying a particular role in an organization, it can be difficult if there are not any other 
members who can do the same work, which can lead to personal stress and anxiety. The Activist Apron 
demonstrated how the distribution of work can help ensure the mental and emotional stability of individual 


members in order to help them continue their contributions to a larger movement. 


7 Discussion 


The themes reflected shared challenges and concerns across activist organizations and strategies for how to 
address them. The exploratory alternatives revealed both the benefits and the shortcomings of increased 
technology use in activist work. Participants’ experiences were crucial in providing insight into the different 
kinds of shortcomings or limitations of consumer technology or platforms. It also extended beyond some of 
the more traditional or higher concept notions of ‘problems’ in activism work—such as funding, technology 
access, and technology education—to offer perspective into questions that are only raised through the more 
minute, day-to-day activities. 


7.1 Design Inspirations 


Many of the prototypes also suggested a focus on internal organizational action. Many conversations position 
technology use in activist work as reactive: a group will use media to directly challenge mainstream media, 
whether through content (e.g. culture-jamming), access (e.g. hacking), or production (e.g. indymedia) 
(Lievrouw, 2006). However, many of the workshop discussions resulted in prototypes or platforms that 
encouraged greater participation from its group members directed towards other members. Devices that 
were concerned with communication were not framed around the disruption of mainstream media outlets; 
sustainability prototypes did not siphon energy from existing infrastructure, but cultivated its own. 
Authorities existed on a legal level, but most of the prototypes avoided confrontation on a media or digital 
level. One interpretation could be that activist design should privilege self-sustenance or autonomy as ideals, 
rather than developing in tandem with or in competition against corporate or mainstream designs or 
practices. Designing for organizational autonomy would emphasize features that minimally rely on existing 
infrastructure (e.g. solar power for renewable energy) and empower members to participate to different 
degrees (e.g. support roles or frontline disruptors). This kind of design allows activist groups to best prepare 
for the unique and contextualized challenges and crises that are specific to each organization’s work. 


7.2 Design Installations 


In anticipation of the workshops, we anticipated technology use to be largely infrastructural, like the use of 
cloud services to share organizational knowledge. While this did appear in discussions, the prototypes were 
overwhelmingly integrated into activist practices. Consider the Bam! Button: its use is designed to be in 
situ, distributing messages as the need arises. A more infrastructural design might have instead offered a 
database of scripts or canned messages to deliver. The emphasis behind the design is not to set a better 
foundation for activist work or necessarily to document the results of a particular action or meeting, but 
rather the prototypes are themselves forms of direct action. This is not to say that documentation or 
preparation is not integral to the success of an activist group, but the prototypes focused on supporting 
action as it was happening. The prototypes acknowledged the tenuousness of direct action, like gathering 
participants or sustaining energy. No matter how much preparation or foundation is implemented through 
activist work, actions are still spontaneous, unpredictable, and volatile and design should focus on how to 
better embrace and support that instability. By incorporating technology into direct actions, the hope is to 
harness the digital affordances of existing platforms (like network connectivity and mobility) to address 
whatever shortcomings arise ‘in the moment.’ 
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7.3 Design Incriminations 


Many of the prototypes and workshop conversations acknowledged an insider status without much further 
critical reflection. There are staff, group members, residents, students, etc. who are all ‘inside’ the 
organization. Similarly, there are ‘outsiders’ to the group, be they antagonistic entities like banks, 
government officials, or police officers, or simply those who are not members. This binary does not account 
for varying and often fluid levels of participation both within and outside an activist group. Consider who 
the Robot Bird might deliver to, or who the Cop Watcher watches. Which entities are privy to receive 
updates about an organization? What are the different ‘levels’ of updates that might be delivered? Similarly, 
who or what does the Cop Watcher consider an authority? Will there be alerts for locally organized 
community watch groups as well as police officers? The insider/outsider dynamic is less dichotomous in 
practice and offers a number of considerations when designing for ‘the group.’ In attempts to strengthen 
internal relationships, there is the risk that a design will alienate those who are on the edges of the 
membership. Because activist work can be contestational and hostile towards perceived threats from 
outsiders, it is all the more crucial to interrogate what constitutes a ‘member’ or ‘insider’ in order to avoid 
re-directing that hostility internally. 


8 Future Work 


The discussions and subsequent prototypes underscore the importance of context: even though platforms 
or technologies are used in similar ways across multiple activist groups, there is a degree of adaptability 
and flexibility that needs to be considered. The context is provided through participants’ experiences: their 
activist work was enacted and embodied in a particular time and place and their prototype designs reflect 
those specific experiences. Literature discussing technology use in activist organizations already 
acknowledges the importance of context: Saeed, Rohde, and Wulf discuss mailing list usage that can deliver 
more general information to members, but could be more useful with recommendation algorithms to better 
distribute more specific knowledge to members with more particular kinds of expertise (Saeed, Rohde, Wulf, 
2011). We contend that design interventions for activist organizations need to be informed by the specific 
work that they do. Because activists participate in different ways—from support roles to communication 
coordination to civic obedience—these varied experiences require contingent and adaptable designs. Vines, 
et al. argue for designs that afford multiple forms of engagement within a single project; this avoids 
privileging a single type of participation and creates a richer and more inviting environment for participant 
contribution (Vines, et al., 2013). Experiences are a means of sense-making, where activists develop deeper 
understandings of their work and the work of other members within the same group (Leong, et al., 2010). 
The sense-making came out through workshop discussions of imagined prototypes; participants did not aim 
to find the ‘ideal’ solution to a common problem, but rather discussed the benefits and consequences of 
different alternatives. Further critical reflection on the specificities of each activist group’s practices, 
processes, and preferences can better determine what kind of design will be more conducive to their work. 
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Abstract 

Although research has shown increasing potential for new media literacy and identity development 
through the use of social networking tools, there are limited opportunities for young people under age 
thirteen to legally take part in these environments. We challenge the dominant narrative that young 
people under thirteen need constant adult surveillance and are incapable of practicing safe online 
practices. Instead, we present a potential solution through the design of a safe, virtual learning space for 
tweens that integrates community-based rules and moderation. In partnership with the National Park 
Service, which is committed to having their virtual learning space accessible to all ages, we collaborate 
with a group of tweens and their parents by using bonded inquiry and focus group methods. We collate 
the needs, concerns, and online practices of these tweens and their parents to develop a preliminary 
design of a cyber-safety framework that learning institutions can employ to allow tween participation. 
By focusing on building a resilient online community of tweens, parents, and site developers, the 
framework emphasizes the value of an online environment that balances freedom and protection of tween 
privacy. 
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1 Introduction 


As access to the Internet increases around the world, research has found that young people are increasingly 
taking part in a participatory online culture that includes creation, collaboration, and sharing of information 
with their peers and the general public. These emerging virtual spaces provide critical environments in 
which young people can develop new media literacies (Jenkins, 2006; Ito et al., 2010). Such virtual learning 
spaces may incorporate social networks, blogs, games, storytelling, virtual worlds, and other features that 
allow interpersonal interactions. In fact, significant social networking practices may well occur within the 
games, storytelling, virtual worlds, and other non-traditional! sites in which young people participate. 
Grimes and Fields (2012) refer to this inclusive range of online activities and practices as social networking 


forums (SNF).? Although SNFs provide learning environments accessible to diverse populations, there are 


' Traditional social networking sites are sites such as Facebook, Twitter, Instagram, and Myspace. 
? Grimes and Fields (2012) defined social networking forums “as a particular online forum or web-enabled platform containing 
technological affordances that enable forms of communication between users, the creation of personal profiles, and the production of 


networking residues while enacting hierarchies of Access...” (p. 55). We use this definition of social networking forum throughout this 


paper, to be inclusive of the range of online activities that are inherently “social.” 
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limited opportunities for young people under age thirteen to legally take part in such environments (Grimes 
& Fields, 2012). Access to these sites prior to age thirteen, and instruction on how to appropriately and 
safely use such tools, however, is often omitted from instruction in public schools (Jenkins, 2006; Ahn, 
Bivona, & DiScala, 2011).Before age thirteen, young people are not legally permitted to engage in these 
SNFs. In reality, however, many young people ages nine to twelve, known as tweens (Rideout, 2007), are 
engaged in mainstream SNF's. Additionally, due to the development of social media platforms and online 
social norms, tweens are making much more personal information public now than they did six years ago 
(Madden et al., 2013). In this evolving media landscape, it is critical to introduce tweens to SNFs in a 
manner that protects them and empowers safe behavior. Numerous studies highlight the disparity in new 
media literacy skills among young people (Ahn, et. al, 2012; Bennett, Maton, & Kervin, 2008; Foss et al., 
2012), potentially affecting how they handle matters of online privacy and confidentiality. Although we 
understand the hesitation to encourage tween involvement in SNFs, the trend of increased tween use of 
social media sites indicates that tweens will continue to engage in these environments regardless of adequate 
privacy education or parental supervision (boyd, 2004; boyd, 2007; Lenhart, Purcell, Smith & Zickhur, 2010; 
Madden & Zickuhr, 2011; Rideout, Foehr & Roberts, 2010; Livingstone et al., 2011). We work from the 
assumption that tweens can be empowered to be safe online and take on a leading role in promoting safe 
community spaces in SNFs. We therefore must consider the diverse experiences that young people have 
with SNFs, the concerns that a parent/guardian has with their tween’s SNF practices, and the role that a 
parent/guardian chooses to play in their tween’s online life. 

This study proposes a cyber-safety framework that can be used to design safe SNFs for tweens and 
young people through community-based rules and moderation. Although this study targets the needs and 
educational goals of learning institutions, the resulting framework can be adapted by commercial 
organizations interested in designing SNFs that target young people under the age of thirteen. The cyber- 
safety framework is developed based on the examination of findings to the following research questions: 


1. How do tweens navigate privacy and cyber-security restrictions put in place by website 
administrators based on legal policies? 

2. What are the concerns that parents have about their tweens’ social networking practices? 

3. What are the roles that parents play in their tweens’ social networking practices? 


In partnership with the National Park Service (NPS), we took preliminary steps to create a safe SNF that 
enhances the experiences young people have with the Junior Ranger in-person program 
(http://www.nps.gov/learn/juniorranger.cfm) and the online activity center, WebRangers 
(http://www.nps.gov/webrangers/). Social networking features such as media sharing and collaboration are 
currently not available in WebRangers. Inspired by WebRangers’ current participants’ desire to share and 
socialize, we worked with tweens (aged 10 through 13) from various socio-economic backgrounds to co- 
design the features for an engaging WebRangers environment. In addition to this, we co-designed the cyber- 
safety framework with these participating tweens and their parents. The focus of this paper will be on the 
latter goal of constructing a cyber-safety framework for SNFs that target tweens. 

We challenge the dominant narrative that young people under thirteen need constant adult 
surveillance and are not capable of safe behavior in virtual learning environments. Instead, we present a 
potential solution through the design of a safe virtual learning space for tweens that integrates community- 
based rules and moderation. 


2 Related Work: Social Networking Forums, New Media Literacy, Privacy, and Parent 
Engagement 


While recent studies vary in their definition of social media, the upward use trend of SNFs has been 
evidenced in numerous reports describing the behavior of young people (boyd, 2004; boyd, 2007; Lenhart 
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et al.,2010; Madden & Zickuhr, 2011; Rideout, Foehr & Roberts, 2010; Livingstone et al., 2011). Specifically, 
surveys conducted by EU Kids Online IT (Livingston et al., 2011) found that 38% of children aged 9-12 
years from various European countries have their own profile in social networking sites, and Lenhart et al.’s 
(2010) Pew Internet and American Life Project study reports that 46% of surveyed 12-year-olds have used 
a social networking site. 

The Children’s Online Privacy and Protection Act (COPPA) is legislation that restricts websites 
from collecting personal and identifying information from kids under age thirteen (Children’s Online Privacy 
and Protection, n.d.). COPPA requires site operators who target children and tweens to obtain verifiable 
parent consent if any personal information is collected. Although the legal age to participate in social 
networking sites such as Facebook and Myspace is thirteen, a study conducted by Consumer Reports quoted 
in Grimes & Fields (2012) found that 7.5 million out of the 20 million Facebook users are under the age of 
thirteen. Tweens are able to find ways around age restrictions, typically by lying about their age (Grimes 
& Fields, 2012; Lenhart et al., 2011; Livingstone, 2008; Steeves, 2006). Studies are still scarce in this topic, 
making it difficult to understand what tweens do on these sites and how parents are involved (Grimes & 
Fields, 2012). 

In this era, where participating in traditional social networking sites and other means of active 
online engagement has been linked to advancement in new media literacy (Ito et al., 2010; Jenkins, 
Purushotma, Clinton, Weigel, & Robinson, 2006) and identity development (boyd & Ellison, 2007; 
Livingstone, 2008; Regan & Steeves, 2010), researchers have begun examining COPPA’s unintended 
consequence of dissuading children from participating in these informal learning environments (Grimes, 
2008). In the pretext of being compliant to COPPA, many sites simply prohibit tween participation and do 
not even attempt to confront the challenges of meeting COPPA requirements. Situating this challenge 
within the context of informal learning institutions that have been recognized as third places of learning 
(Watson, 2010), we strongly feel that these institutions must tackle these issues to avoid the “digital divide” 
that is currently prevalent. The “digital divide” is no longer about access, but instead the depth of 
engagement and level of participation (Hassani, 2006); it is heavily influenced by socio-economic status, 
race, ethnicity, gender, parent education level, and household income (boyd & Hargittai, 2013; Thompson, 
Subramaniam, Taylor, Jaeger & Bertot, in press; Warschauer & Matuchniak, 2010). 

Navigating the complex privacy and cyber-safety structure requires the mastery of new media 
literacy by tweens, strong parent engagement, and commitment from the developers of virtual spaces and 
SNFs to make these spaces safe. In Livingstone’s (2008) study, where she interviewed teens between the 
ages of 13 and 17, she found that “teenagers described thoughtful decisions about what, how and to whom 
they reveal personal information, drawing their own boundaries about what information to post and what 
to keep off the site, making deliberate choices that match their mode of communication (and its particular 
affordances) to particular communicative content” (p. 404). However, the teens in her study also lacked the 
media literacy needed to manage these privacy settings and listed the operation of privacy settings as one 
of the priority areas that need to change in social networks (Livingstone, 2008). 

Family dynamics play a crucial role in tweens activities and experiences online (Lenhart et al., 
2010). There has been some research on parental involvement in a child’s engagement with media that 
studies the impact of co-viewing, co-reading, and intergenerational play (Takeuchi & Stevens, 2011; 
Williams & Merten, 2011). In studies with young children (between the ages of 3 and 10), families impose 
a variety of rules and practices at their homes (Takeuchi, 2011). They also monitor their child’s activities 
online by friending them on social media, checking on the websites that their child has visited, and blocking 
or filtering specific content (Lenhart et al., 2011; Rideout, 2007). In the Parents, Children & Media: A 
Kaiser Foundation Survey, parents of 9- to 17-year-olds believe that they know “a lot” about what their 
kids are doing online, but also feel that the tools they use to exercise parental control are far from perfect 
(Rideout, 2007, p. 10). In studies with older children, Ito et al. (2010) finds that young people generally 
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perceive the rules imposed (such as blanket prohibitions, technical barriers such as filtering, and time limits) 
as “raw and ill-formed exercises of power” (p. 343). However, Lenhart et al. (2011) found that 86% of older 
children (ages 12 to 17) still list parents as their best choice for advice about challenging online experiences. 

Due to COPPA and the emerging strong interest in cyber-safety, sites aimed towards younger 
children and tweens employ a variety of administrative and community moderation mechanisms that shape 
the culture of engagement and empowerment on the site. Some sites for tweens and younger children have 
an automated system that blocks certain words, removes postings, applies filters to chat-based 
communication, and limits the type of friendships that one can have with adults. Some sites have dedicated 
employees who screen for messages considered inappropriate and block users for revealing their real 
identities (such as real names versus screen names). Some sites rely more on parents to select from access 
options provided for their children and limit their child’s interaction with other users (such as limiting the 
chat words or sets of words or icons that can be used) (Grimes & Fields, 2012). 

We take advantage of our partnership with NPS to examine how tweens, their parents, and learning 
institutions committed to virtual engagement and cyber-security can join forces to create a cyber-safety 
framework. This study begins exploration of this potential partnership by collating the needs, concerns, and 
online practices of tweens and parents to develop a preliminary cyber-safety framework that learning 
institutions can employ to allow for tween participation. We challenge the dominant ethos that points the 
finger to any one of these entities as holding sole responsibility for the safety of tweens. 


3 Settings and Methodology 


The research team was approached by the personnel from NPS with the task of constructing a safe SNF to 
augment and enhance the experiences that young people nationwide have with the Junior Ranger in-person 
program. While NPS values the impact of such socio-technical systems to advance their mission, visibility, 
marketing, and international tourism, they remain cautious about such an initiative for three primary 
reasons: 


1. NPS was uncertain about SNFs and the features necessary to address the needs and interests of 
younger children, especially tweens; 

2. The current traffic to WebRangers includes people of all ages. Thus, NPS was unsure of the cyber- 
security features needed to maintain the safety of the younger children on the site; and 

3. Similar to other federal government agencies, NPS is vigilant about the personnel time that must 
be invested in the implementation of SNFs. They emphasized their inability to dedicate extensive 
personnel time to monitor or moderate the virtual learning space (such as the moderation practices 
in Scratch’s online community).* 


Using these concerns as pivotal considerations, we designed a study that included identification of the SNF 
features that tweens might be interested in and the creation of a framework for social interaction that 
promotes and maintains cyber-safety. This paper focuses on one, small portion of the larger study, 
highlighting findings pertaining to the creation of a cyber-safety framework that can be implemented in the 
newly proposed NPS site and by other organizations that are interested in setting up such space for their 
communities or patrons. 

We examined the various participatory design methods that were available when working with 
children, such as informant design (Scaife, Rogers, Aldrich, & Davies, 1997), cooperative inquiry (Druin, 
1999, 2005; Guha et al., 2005), and bonded design (Large, Nesset, Beheshti & Bowler, 2006; Large, Bowler, 
Beheshti & Nesset, 2007). Upon close examination, we decided to utilize the bonded design method to 
gather input from tweens on how they navigate legal policies and manage their information and privacy on 


3 Scratch is a programming language for kids ages 8-16, featuring a SNF where users can post their work and interact and collaborate 
with others. An explanation of the Scratch moderation system is available at http://scratch.mit.edu/parents/. 
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SNFs. Large et al. (2006) define the bonded inquiry method as placed between informant design and 
cooperative inquiry, essentially drawing the strengths of various design-based methodologies to create low- 
tech or working prototypes for technology design. Large et al. (2006) describes bonded design inquiry as: 


..[a] means of bringing together a team that unites in diversity. It brings together adult experts in 
design and child experts in being children, who work together throughout the design process. Like 
cooperative inquiry, it emphasizes an intergenerational partnership in working towards a common 
goal and the idea that children should play an active role in design rather than merely being 
evaluators or testers at the end of the design process. It does question, however, the nature of the 
cooperation between adults and children within the team. In this respect, it shares some of Scaife 
et al.’s (1997) reservations concerning the extent to which true equality can exist within an 
intergenerational team. At the same time, however, bonded design differs importantly from 
informant design in its inclusion of children throughout the design process and as full team 
members. It also rejects Scaife’s view that children are most helpful at suggesting ideas only for 
motivational and fun aspects. (p. 79) 


We adopted the bonded design methodology for three primary reasons. First, we believe in the power of 
children involved as full design partners, as a means of attaining the true perspective of children. Secondly, 
we engaged diverse tweens who have varying new media literacy skills and familiarity with SNFs. Thus, 
although we strived for equal partnership between the tweens and ourselves (the adult partners), the adult 
partners had to set the agenda and maintain the direction and organization of the sessions. Lastly, due to 
the short length of the study (the study was restricted to five months as a result of financial and scheduling 
constraints), we were only able to participate in two co-design sessions (90 minutes each), which essentially 
required the adult partners to pay attention to specific aspects of the research to ensure that the research 

goals were achieved. Although the atmosphere of the co-design sessions was 


informal, we set time limits for each activity and brought things to order when 
necessary. In such cases where design sessions cannot evolve into an equal 
partnership and for an extended period of time, bonded design is most 
appropriate (Large et al., 2006; 2007). 

Tweens were recruited from local public schools in the Washington, 
D.C, and Maryland area, and we paid special attention to the socio-economic 
distribution of these families. Four of the seven participating tweens receive 
Free and Reduced Meals (FARMS) at their schools. FARMS is a common 
indicator of poverty rate in schools in the United States (U.S.). We brought 
these seven tweens and their parent (all tweens were accompanied by one 
parent, with the exception of two tween siblings who came with both of their 


parents) to Kenilworth Park and Aquatic Gardens in the Washington, D.C, 


area to participate in activities inspired by the park’s Junior Ranger activity 
Figure 1: Tweens 4 
eee 3 booklet. 
participating i activity Following the activities in the physical park space, the tweens took 
with Ranger at the 
Kenilworth Park and 


Aquatic Gardens 


part in bonded design activities that allowed them to create a low-tech 
prototype of the virtual learning space that would act as an extension of their 
experiences in the park. We utilized a variety of co-design techniques to 
engage tweens to obtain feedback pertaining to the larger study, including 


4 Most of the national parks in the U.S. participate in the Junior Ranger program. It provides guides to young adults to enhance their 
in-person park experiences, through activities delineated in a booklet that allow them to earn physical badges. 
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bags-of-stuff® and sticky noting® (Guha, Druin & Fails, 2012). The tweens came up with a variety of SNFs 
(with very involved contribution by the adult design partners) that can be integrated into WebRangers, 
including virtual parks, massive multiplayer online games, scavenger hunts based on virtual park 
interactions, a scrapbook, avatar creation, and live webcams in the park. Tweens wanted these forums to 
be connected to each other and reactive to user activity (for example, earning a badge through a scavenger 
hunt will allow the badge to appear in their scrapbook). Tweens also discussed desired social media elements, 
including a personal profile, avatar, and newsfeed, as well as the ability to contribute information about 
various parks, connect to friends, and subscribe to other users and discussion topics (similar to the friending 
in Facebook and following activity on Twitter and Instagram). Whenever appropriate, we engaged tweens 
in a discussion of their current online and privacy activities as they designed and described their prototype 
of the virtual learning space by intermixing prompting, brainstorming, and critiquing as stipulated in the 
bonded design method (Large et al., 2006). A concurrent, separate parent focus group (based on guidelines 
by Morgan (1988)) generated discussion regarding parents’ perspective on tween engagement in social 
media, safety concerns, and family 
practices. 

A month later, a second bonded 
inquiry session reconvened the tweens to 
take part in more specific design activities 
inspired by their ideas from the first 
meeting. The activities prompted them to 
reflect on their ideas about social media 
and online privacy practices. In this second 
session, we had five tweens and four 
parents participating. In small groups, the 
tweens selected the SNFs that they were 
interested in from the list generated by 


brainstorming in the first session, designed 
a more detailed prototype of these SNFs, 
and integrated the social media-like Figure 2: Tweens participating in bonded design inquiry 
activities into the design, using the bags of 
stuff technique. We engaged them in a discussion on functionality and privacy concerns as they imagined 
themselves participating in their own design of the SNFs. At the conclusion of this co-design activity, we 
asked the tweens to answer a series of privacy-related questions in the form of tweets (in 140 characters of 
less). As tweens shared their responses, the researchers asked them to elaborate on their answers. 
Concurrently, parents of these tweens took part in a focus group discussion where they provided 
feedback that helped to refine the recommendations provided in the first focus group session. We discussed 
the idea of having layers of privacy and community moderation. After these discussions, parents participated 
in a “deal breaker” activity, where they were asked to indicate their comfort level with different features 
on a site (green indicated they were okay with a feature in a public space, yellow indicated they would be 
okay with a feature if it were only used with selected connections, and red signaled that they would not be 
comfortable with their child using a feature, regardless of the privacy customization). Finally, parents were 
also asked to write tweets (in 140 characters or less) in response to a series of privacy-related questions. 


5 Bags of Stuff is a prototyping technique in which children use big bags filled with art supplies such as glue, clay, string, markers, 
socks, and scissors to create low-tech prototypes of technology. 

ê Sticky noting is a technique for critiquing a prototype. The technique involves children and the adult design partners writing down 
on sticky notes what they like or dislike about the current prototype. 
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All of these sessions were video-recorded, and the researchers also reflected on the sessions through 
observational notes immediately afterwards. All artifacts and prototypes produced during these sessions 
were photographed. The researchers watched the videos of all sessions, noted vital points made by 
participants, and transcribed salient excerpts. 

Adopting the approaches of grounded theory (Strauss & Corbin, 1998), all authors used open coding 
and selective coding techniques to analyze the transcripts from the design and the focus group sessions. 
Team members compared their coding to ensure consistency and reliability. Memos were kept of coding 
decisions to establish an audit trail. The memos were consulted to ensure that consistency had been 
maintained throughout the process. The emergent themes in relation to the research questions are reported 
below. 


4 Findings 


4.1 Navigating privacy and cyber-security restrictions: the tween perspective 


All tweens who participated acknowledged that they have to lie about their age to access SNFs such as 
Google+, YouTube, Facebook, etc. Thus, lying is indeed a common way that tweens navigate the SNFs as 
indicated by previous studies (Grimes & Fields, 2012; Lenhart et al., 2011; Livingstone, 2008; Steeves, 2006). 
From our analysis, it was evident that the tweens’ styles of navigating legal policies and managing privacy 
varied depending on their experience with SNFs and their perception on the responsibilities that they, their 
parents, and site developers play (or should play) in the privacy landscape. 

Experience with SNFs is the key determinant to how “savvy” the tweens were in discovering how 
to circumvent restrictions imposed by COPPA. For example, Sean (pseudonyms are used for all tweens 
referenced in this paper), who is very familiar with SNFs, pays attention to patterns that sites use to detect 
a potential breach of COPPA. He describes: 


If you accidentally put a [birth] year that shows that you are under thirteen, you then go again [to 
create an account], Google remembers that you have lied about your age, like you close the browser, 


and open a new one, and it [Google] kicks you. 


He goes on to explain how he used a different computer to create an account. Another tween, Chris, manages 
multiple email accounts and uses these emails as parent emails to authenticate account creation on various 
gaming sites. 

Several tweens mention that SNFs should not rely on age but the maturity of tweens and suggest 
that some simple tests be conducted to authenticate maturity before registering on a site. From the 
discussion, the nature of the test were not well defined and would not comply with COPPA, but the idea 
that the tweens think that they are mature enough to be online is intriguing. They also are vigilant about 
what to share, what people learn about them through their online activities, and what can happen if the 
information that they have shared gets into the hands of “creepy people.” Sean responds to a privacy- 
related prompt, “Do not share info like where you live, how old you are, or your name or other important 
things to people you don’t know.” Sonya mentions: “Be careful of the people you choose to be your contacts.” 
She elaborates during the presentation that contacts who are adults must be adults that tweens are 
personally close to. 

All participating tweens feel that parents must play a pertinent role in the online activities of their 
tweens. Chris strongly indicates, “It is the parents’ fault if something goes wrong, they should be watching 
their kids.” Similarly, they also feel that SNF developers need be equally responsible by monitoring patterns 
of speech, spamming, and questionable online behavior of all of their users. For example, Leah stresses the 


following: 
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Let kids join, but monitor their accounts and if anything bad is said by or to them, ban whoever 
said it, [but]..give kids the space to do what they want, don’t block everything...make sure kids get 
to do what the same thing that adults can do, but make it safe. 


It is clear from the responses above that the tweens acknowledge that it is an equal responsibility of the 
tweens, their parents, and SNF developers to keep the SNFs safe. 


4.2 Parental concerns about social networking practices 


Madden et al. (2008) identify three types of perceived online threats that can help to understand how the 
Internet is conceptualized in a negative light: sexual solicitation, online harassment, and problematic 
content. Using this lens, we found that sexual solicitation was not something that parents of these tweens 
were most concerned about. Online harassment, particularly in the form of cyber-bullying, however, is 
considered a vital threat. One parent describes her hesitation in having her son participate in social media 
due to the perceived threat of bullying and the lack of control in who he will come into contact with. When 
asked about what she was concerned about with her child participating in social media, she responded: 


And the bullying aspect that they keep talking about in the media, so we just don’t [participate in 
social media]. They did that penguin thing though, Club Penguin, but that’s kind of really 
anonymous. 


Another parent also identified the threat of online harassment through virtual bullying, but discussed how 
this threat could be related to how kids view social media as a space that is not subject to the same rules 


and expectations they encounter in physical spaces. 


Kids can say harmful things in a physical room and they can do the same thing in a Google 
hangout...so learning the same sort of social etiquette that you would want a child to follow...in a 


physical space is also important in a virtual space.” 


Finally, problematic content most often refers to violent media and adult pornography (Madden et al., 
2008). The consensus among parents is that youth will unwillingly be subjected to such content when 
participating in online activities and that they will easily be able to access problematic content despite 
parental or legal restrictions, and that both forms of exposure will negatively impact youth (Madden et al., 
2008). 

Although parents identified with the perceived threats described by Madden et al. (2008), they also 
acknowledged the power of learning via SNFs. Parents described how their children use social media sites 
like YouTube to help with school projects and engage in problem solving in games such as MineCraft. 


4.3 Parental roles in tween social networking practices 


Based on parent responses in focus groups, we identified three types of roles that parents play in their 
tween’s social media use. One adult may demonstrate a perspective that overlaps more than one of these 
broad categories, but such categorizations were useful to understanding parental views and behavior in 


relation to their tween’s social media use. 


4.3.1 The inspector: 


This role is characterized by a parent that monitors their child through strict supervision of the tween’s 
behaviors. One parent described her practice of reviewing her tween’s search history to see which sites he 
visited. Other parents required that their child give them the password to all social media accounts, or use 
the parent’s own account so the parent could log in to view their child’s behavior. One parent also discussed 
managing a gaming server that their children and friends played on so history could be reviewed by the 
parent. 
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4.3.2 The co-user: 


Parents also described a role as a co-user when it came to interacting with their child and social media. 
This behavior included sitting next to tween while they use social media and engaging in conversation about 
what is happening online, requiring that a parent is in the same room when the tween uses the Internet, or 
establishing a virtual presence in social media, such as friending their child in social media or participating 
in the same game their tween plays. 


4.3.3. The independent: 


The independent parental role indicates a parent who is comfortable with their child using social media 
with limited parental moderation. When it came to mainstream social media (a site targeted toward youth 
and adults alike), no parents in the study adopted an independent role. Some parents exercised the 
“independent” role, however, when their child used a site they knew had more built-in moderation structure, 
such as Club Penguin. Also, most parents were not aware of inherent “socialness” in gaming spaces and 
were more open to allowing their tweens to participate in these sites without any supervision. 


5 Implications and Design Considerations 


This study highlights the need to balance a tween’s freedom and protection while providing a flexible 
parental role, depending on varied parental concerns and perceived threat with SNFs. The participating 
tweens and their parents value the need for some type of moderation from the SNF developers, but embraced 
the idea of shared responsibility to sustain a safe virtual space. 

Building on the findings that we described in the above section, we provide recommendations for a 
layered community structure and propose a cyber-safety framework for NPS WebRangers that highlights 
the ecosystem of socialization and communication that may happen between tweens, their known peers, 
their parents, unknown peers with similar interests (in specific parks, animals, badges, etc.), and the general 
public. In the layered community, we provide a structure that allows for interpersonal engagement on the 
site while promoting tween and parent engagement in safety decisions. 

As discussed in the above findings, tweens and parents agreed that parents should play a role in 
keeping tweens safe online. Even “independent” parents who allowed their child the most freedom online 
expressed remaining concerns about safety. We also found that it is the interactive social features of SNFs 
that tweens find attractive, researchers find educational, and parents find concerning. In order to allow for 
tweens to legally participate in the dynamic learning environments of SNFs while allowing parents to be 
involved to varying extents, we devised a layered 
community structure that provides tweens the interaction 
and freedom they want while providing parents an easy way 
to observe their tween’s activity online and moderate if 
desired. We believe that creating an online community with 
many features of mainstream SNFs that welcomes tweens as 
well as their parents can provide a space where youth can 
learn, socialize, and develop safe online practices. 


5.1 Empowered tweens 

We propose two layered spaces for virtual engagement in 
an SNF (see Figure 3). Using WebRangers as a case, the 
public space layer is where tweens can interact with park- 


related content while having very limited ability to disclose 
identifying information about themselves. As minimal 


Figure 3: Layered Community Spaces 


personal information is collected in this space, participation 
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in these sections does not require parent consent, although a connected parent account is strongly 
recommended. 

Livingstone (2008) discusses gradations of intimacy in SNFs by highlighting how a teenager wants 
to make distinction between her friends: 


..She is frustrated that her site does not allow her to discriminate between who knows what about 
her 300 or so ‘friends’... Being required to decide whether personal information should be disclosed 
to ‘friends’ or to ‘anyone’ fails to capture the varieties of privacy that teenagers wish to sustain.” 
(Livingstone, 2008, p. 405) 


This is similar to what tweens in this study conceptualize as balancing freedom and protection. Out 
participants showed a desire to express their maturity by navigating social online spaces, but also recognized 
that there were dangers in doing so and saw safety as a compromise between the platform, the parent, and 
the user. These findings speak to a desire to have finely grained control over interactions on the site to 
support safe use. Thus, we propose both interest and inner circle connections in the personal space area. 
Through the interest circle connections, we provide a way for tweens to connect to those they do not know 
in real life in order to support each other’s interest in the parks or aspects of the park (such as animals, 
science, etc.). The inner circle represents reciprocal connections: both users must accept an inner circle 
connection in order to become part of each other’s inner circle. Our participants showed a desire to be not 
only users of the space, but also active participants in maintaining their online safety. With this in mind, 
both spaces have affordances for tweens to practice community moderation (such as peer flagging), for 
problematic content and reporting online harassment. In addition to the community-based moderation, 
automatic flagging measures is enforced for postings which are obviously out of bounds (such as sharing 
personal information or using inappropriate language). This blended approach to security addresses our 
participants’ desire for both autonomy and safety, while staying within the limited budget of public 
institutions like the NPS, which often cannot afford a full-time moderation staff. However, as our 
participants recognized, safety is a joint responsibility of the parent as well. Following from this, we make 
recommendations for the parent side of the cyber-safety framework in the following section. 


5.2 Empowered parents 


As mentioned in the Findings section, we observed that parents take on different roles in interacting with 
their tweens online. In order to allow parents flexibility in their online involvement and the ability to take 
on a role as an inspector, co-user, and independent, we include customizable parent features in our 
framework. Because tweens share personal information and media with their inner circle mentioned above, 
we built in a parent verification mechanism in this cyber-safety framework (parents can decide the extent 
of verification depending on what is shared and the role they wish to play). We recommend that in addition 
to parent verification for a tween to access an inner circle, parents should be encouraged to sign up with a 
parent account linked to their tween’s account. Instead of requiring parents to frequently log in to check 
activity, customizable parent digest options are recommended. Parents will be able to decide how often 
they receive a digest and what updates they would like the digest to include. The digest will also contain 
approval options for the release of personal information within the inner circle, such as tagging, group 
formation etc. Customized options will allow parents to determine their involvement in their child’s use of 
the SNF. Inspectors, for example, can require that they approve all posts their child shares with the inner 
circle and may choose to restrict their child from sharing certain media with inner circle members. Co-users 
may prefer a daily update of their child’s activity along with approval requests for inner circle connections. 
Independent parents may instead opt to receive a weekly update of only inner circle connections as well as 
when others tag their child in a post. This level of the cyber-safety framework allows the parents to choose 
their involvement in the SNF, be aware of activity that is important to them, and modify their preferences 


over time. 
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The Privacy Ecosystem diagram (Figure 4) provides a breakdown of the cyber-safety framework, 
and it details various types of interactions, community moderation, and parental control that are available 
within the layered environment of the NPS virtual community. Similarly, the Parent Interaction diagram 
(Figure 5) captures the essence of parents’ interactions on the site, highlighting the cyber-safety features 
that will allow parents to monitor their child’s activity in the virtual community. 


The Privacy Ecosystem 

Each element on the site exists with certain restrictions on how information is shared. The 

directional lines in the diagram below represent different relationships between users and site 

features. Solid lines indicate a relationship where full information is shared between users. Dotted 
lines represent limited sharing of information, like username and avatar, through mechanisms 
such as feed updates. 

1. The Park Profile—Tweens can upload media and blog posts to park profiles. Parks that the 
tween subscribes to will send updates to their newsfeed with a link back to the park profile 
page. 

Discussion Forums—Tweens can participate in discussions about different park topics (for 
example, wetland conservation) with only their public information visible. Updates on 
conversations that the user has participated in will update to their newsfeed. 

Shared Interest Connections—Tweens can connect to peers on the site that they do not 
know in real life by subscribing to them. This is a one-way connection where the subscriber 
will receive updates in their newsfeed with their connection’s recent public activities. Tweens 
can control what information is public to these connections and can manage settings so 
approval is needed for an interest connection. 

Topic Updates—Tweens can follow a specific topic and get notifications in their newsfeed 
whenever new items tagged with this topic are added (for example, tweens can subscribe to 


the tag “beavers”). This is a one-way relationship and no information is shared. 


Inner Circle Connections—Real-life friends who have a parent-approved connection on the 
site can share some personal information with one another. A tween’s feed will also be 
populated with the activities of their inner circle connections. 

The Parent Digest—All interactions on the site are shared to the parent digest, which parents 
can adjust for the type of content delivered and frequency of updates. 

The Wider Web—Some material is shared to the wider web, such as forum conversations 
and park profiles. This content is subject to community and administrative moderation. 


Figure 4: The Privacy Ecosystem 
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Parent Interactions 


. Approving Connections — Parents are able to see recent interactions (for 
example tagging of content, or accepting requests for inner-circle 
friendship), and either approve or decline these connections. This feature 
was implemented to allow for rich social interactions such as tagging and 
friending, while still fitting with a level of control that our parent 
participants felt comfortable with. These interactions would be included in 
the parental digest (mentioned in item three). 

. Determining Privacy Settings — Tweens and parents would be able to work 
together to determine privacy settings, allowing parents to control what 
aspects of a tween’s profile is available to the broader web. 

. Parental Digest — This is an automatically colated report which details 
recent interactions that the tween has taken on the site. This allows for the 
parent to be involved with their child’s life online. Following our 
observation of the three different types of parental involvement (Inspector, 
Co-Users and Independent) this feature has a customizable level of 
reporting. 

. Flagging Content — Following from both the recognition of our participants 
of the importance of creating a healthy community, and the expense of live 
moderation, community flagging of objectionable content serves to a take a 
role in setting and upholding community norms for the site. 


Figure 5: The Parent Interactions 


6 Further Research 


This study points to the importance of technology design that supports expectations of tweens and their 
parents regarding the disclosure of personal information to the public and selected friends. The resulting 
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cyber-safety framework can be applied across diverse SNFs that learning institutions may use to engage 
their tween audience. As an organization that is committed to providing access to their online and in-person 
programming for patrons across all ages, the NPS is invested in integrating the proposed cyber-safety 
framework to their existing WebRangers learning environment and currently working on acquiring funds, 
expertise and support from internal and external resources. 

Although we believe that our proposed cyber-safety framework can be used by NPS and adapted 
by other organizations interested in building similar spaces for tweens, we acknowledge the underlying 
complexity of executing such privacy settings for socio-technical systems and the potential limitations of 
the suggested framework. Both the interface and interaction design need to be intuitive and easy to navigate 
for tweens and their parents who will have varying degrees of technical and new media literacy. In addition, 
the evolving needs of tweens, the variant roles that parents can play in their tweens’ social networking 
practices (due to work or family obligations), and the affordances and limitations of technology platforms 
will need to be considered in the design of such virtual environments for tweens and younger children. The 
intricacies of the framework will also need to be refined based on the features/interactivity that the learning 
institutions ultimately decide to offer to the users of their site such as virtual parks, massive multiplayer 
online games, scavenger hunts, etc. 

In the next phase of our work, we will be working with a larger number of tweens and their parents 
to design working prototypes, which will handle further practical and technological considerations. 
Nevertheless, this study opens up the possibilities of new approaches to create and sustain a workable and 
safe virtual environment that empowers tweens and their parents in protecting the cyber-safety of their 
community. Instead of throwing tweens into the wild or snooping on their activities, we promote nurturing 
their online practices by building a resilient community of tweens, parents, and SNF developers. 
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Abstract 

Prior research suggests that listeners from different cultural backgrounds appreciate music differently. 
Although music mood/emotion is an important part of music seeking and appreciation, few cross-cultural 
music information retrieval (MIR) studies focus on music mood. Moreover, existing studies on cross- 
cultural music perception often only compare listeners from two cultures, in most cases, Western vs. Non- 
western cultures. In order to fill these gaps, this study compares music mood perceptions of listeners from 
three distinct cultures: American, Korean, and Chinese. Our findings reveal that the perceptions of the 
three cultural groups are generally different, but in many aspects, Korean listeners are situated in between 
listeners from the two other cultures. This paper describes the comparison of the three cultural groups 
from the perspectives of mood perceptions, musical (stimuli) characteristics, and listeners’ (subjects) 
characteristics. The findings of this study have implications for the design of cross-cultural and global 
MIR systems. 
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1 Introduction 


Over the past decade, much of the music information retrieval (MIR) research has been focusing on popular 
or classical music from Western cultures (Serra et al., 2013). As a result, there is a lack of research that can 
help us tackle the challenges in developing technologies that reflect musical diversity in different cultural 
contexts. At present, the issue of cultural diversity is ever more important in the field of MIR due to the 
fact that music is increasingly being appreciated across cultural boundaries through globally accessible 
websites such as YouTube as well as various social media. This study aims to contribute to improving our 
understanding of multicultural issues in MIR, specifically how people from different cultural backgrounds 
perceive the mood of music. In MIR, music mood has recently emerged as a potential metadata element or 
feature for better organization of and access to music (Lee et al., 2012). Yet, our general understanding of 
how real users perceive music mood is still lacking, and especially so with regard to users from multiple 
cultural backgrounds. 

This study attempts to fill two significant gaps in current research. First, although the number of 
cross-cultural MIR studies has gradually been increasing, few studies explore music mood/emotion which is 
an important part of music appreciation. Second, many cross-cultural studies only compare two cultures, 
usually Western vs. non-Western cultures (e.g., Indian and European listeners in Gregory and Varney 
(1996), New York and Hong Kong music users in Nettamo et al. (2006), Korean and North American users 
in Lee et al. (2005), and Western and African listeners in Eerola et al. (2006)). Few studies explore how 
music information behaviors of users from multiple non-Western cultural groups compare to one another, 
and to a Western cultural group. We selected three cultural groups for comparison (American, Korean, and 
Chinese) so that we are not only comparing Western vs. non-Western users, but also two non-Western user 


groups. 
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Korean and Chinese user groups were chosen because of their unique cultural distinctions and 
connections. As Korea and China are geographically close to each other, there have been many interactions 
between the two countries in their long history of thousands of years. Therefore Korean and Chinese cultures 
have developed common elements and similar traditions. However, over the last century South Korea has 
been heavily influenced by Western cultures, specifically, American culture, primarily due to America’s 
involvement in Korea following World War II. For decades, Korean people have been actively listening to 
Western music, and American pop songs are well-received and quite popular in Korea. China, on the other 
hand, has remained much more isolated from the influences of Western culture until fairly recently, about 
two decades ago. We wanted to understand how similar or different these two non-Western user groups 
are, compared to the American user group. A comparison between more than two cultural groups will bring 
us one step further towards building cross-cultural and global MIR systems. 


2 Related Work 


Existing comparative studies of Western and non-Western music listeners in music psychology tend to focus 
on aspects other than music mood, for example, perception of complexity (Eerola et al., 2006), the role of 
music in everyday life (Rana & North, 2007), functions of music and music preference (Schafer et al., 2008), 
and so on. However, a small number of studies do examine how music mood judgments can transcend 
cultural boundaries. In some studies, listeners were asked to identify the mood of music from different 
cultures (e.g., Balkwillm and Thompson (1999), Fritz et al. (2009)). These studies discovered that some 
basic emotions such as happiness, sadness, anger, or fear are recognizable by listeners across cultures. More 
specifically, Balkwillm and Thompson (1999) discovered that Western listeners were able to identify 
intended emotion such as joy, sadness, and anger in Hindustani raga. Fritz et al. (2009) found that native 
African listeners could identify happy, sad, and scared/fearful emotions in Western music. In other studies, 
listeners from different cultures were asked to describe the mood of music from single or multiple cultures 
(e.g., Wong et al. (2009), Gregory and Varney (1996), Hu and Lee (2012)). They found many subtle 
differences in how mood is perceived by listeners from multiple cultures, and that cultural tradition and 
background, in addition to the inherent qualities of the music, affect listeners’ mood judgments. Wong et 
al. (2009) revealed the in-culture bias of Indian and Western listeners when judging the tension in music. 
Gregory and Varney (1996) found that Indian and European listeners show a number of subtle differences 
in their mood descriptors and cultural tradition was more important than inherent qualities of music when 
listeners determined the music mood. Hu and Lee (2012) also found that mood judgments do differ between 
American and Chinese listeners. 

This study builds upon these works, in particular, our previous work comparing music mood 
perceptions of American and Chinese listeners (2012). Some important findings of our previous work include 
that listeners from different cultural backgrounds tended to give different mood labels when the mood of a 
song was not very obvious. Listeners agree more with other listeners from the same cultural group than 
with those from the other cultural group. American and Chinese listeners had different opinions on all the 
five music genres in consideration (i.e., Dance, Easy listening, Pop, Rock, and Other), and listeners’ gender 
and age did not have as strong an influence as cultural background on their judgments of music mood. It 
may not be surprising that many differences exist between listeners from American and Chinese cultures, 
since the two are often recognized as representing the contrasting Western and Eastern cultures. The case 
of Korea is a bit more intriguing, as Korean culture shares similar roots with Chinese culture, but is also 
strongly influenced by American culture. 

To summarize, this study distinguishes itself from related cross-cultural MIR studies in that: 1) it 
considers three cultural groups, with one of them highly influenced by the other two; 2) explicitly measures 
and compares the distance among the three cultural groups; and 3) the songs (stimuli) used in this study 
cover a range of musical characteristics (i.e., instrumental vs. vocal, different genres). 
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3 Research Question and Method 


3.1 Research Question 


The overarching question in this study is how similarly or differently listeners from American, Korean, and 
Chinese cultural backgrounds perceive music mood. We attempt to answer this question from the three 
following perspectives: mood judgment distributions and agreement levels, music (stimuli) characteristics, 
and listeners’ (subjects) characteristics. 


3.2 Study Design 


Listeners from the three cultural groups were recruited to answer an online music listening survey consisting 
of thirty 30-second music clips. The clips were selected from Western songs: half from the APM (Associate 
Production Music) library' (instrumental music in a range of different genres), and the other half from the 
USPOP collection (Ellis et al., 2003) consisting of Pop and Rock songs with lyrics. All the songs had 
previously been evaluated by three to five listeners whose cultural backgrounds ranged from Europe, North 
America, and Asia. In order to avoid selecting the “obvious” songs with highest agreement, we selected 
songs that had greater disagreement among the listeners. The music pieces from the APM library were 
instrumental, and therefore we balanced our test dataset by drawing the other half of the 30 pieces from 
the USPOP collection (Ellis et al., 2003) ensuring that they all had vocal components. More detailed 
description on how the songs are selected can be found in Hu and Lee (2012). In responding to the survey, 
participants listened to each of the clips and selected one out of five mood clusters that best describes the 
mood expressed by the clip. Each mood cluster contains several mood terms that collectively better represent 
a particular mood as opposed to a single mood term. If none of the mood clusters were deemed appropriate, 
the listener could choose the “Other” option and specify the mood in their own vocabulary. 

The five mood clusters were adopted from the Audio Music Mood Classification task in the 
community-based evaluation framework MIREX (Music Information Retrieval Evaluation eXchange) (Hu 
et al., 2008) (reprinted in Table 1). MIREX is the venue for comparing state-of-the-art algorithms and 
systems that are relevant for music information retrieval. The Audio Music Mood Classification task was 
first included in 2007 and a number of relevant algorithms were evaluated every year since then. The five 
mood clusters used in MIREX Audio Music Mood Classification task were derived by conducting a 
hierarchical clustering on a co-occurrence matrix of music mood labels from the All Music Guide”. They are 
commonly used in a number of MIR studies on music mood evaluation and comparison such as Bischoff et 
al. (2009), Laurier et al. (2009), Dang and Shirai (2009), Panda and Paiva (2012), to name a few. 


Cluster 1 (C_1) passionate, rousing, confident, boisterous, rowdy 

Cluster 2 (C_2) rollicking, cheerful, fun, sweet, amiable/ good natured 
Cluster 3 (C_3) literate, poignant, wistful, bittersweet, autumnal, brooding 
Cluster 4 (C_4) humorous, silly, campy, quirky, whimsical, witty, wry 
Cluster 5 (C_5) aggressive, fiery, tense/anxious, intense, volatile, visceral 


Table 1: Five mood clusters used in MIREX 


The listeners were recruited via mailing lists for university students. Listeners in each group identified 
themselves as “American” raised in the United States, “Korean” raised in South Korea, or “Chinese” raised 
in Mainland China. The survey was translated into Korean and Chinese (simplified characters) for respective 


Accessible at: http://www.apmmusic.com/apm-libraries 


2 Accessible at: http://www.allmusic.com 
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user groups in order to help them better understand the meanings of the mood labels. Table 2 shows the 


translation of the mood terms. 


go Korean: 2% qao SRA E, Aa Qe, 277) 4AE, Stade 

— Chinese: F EÉ, (EAX AKI, HEKI, ARS, WRS 
iog Korean: $A S, AQ, EAF, SHS, Ata La 

Ji Chinese: fila, ERK, ARRAY, FEKS, T/N EEA 
acs Korean: AAA, 4-63, oat, EFAA, 7}, Sst 

= Chinese: EEH, MRI, ERAR, AREKAN, MARAS 

Korean: 2] #2242, -A28242, Hts, A, 7A, AAIE, aE 

c_4 Chinese: B4BKAY, (EEE, MEA EEA, TÆR, SRE IFN, WRAS, URIAS 
Ge Korean: 7449, 222, phates Aas HAAS, BsaAel 

= Chinese:4f-=-A), ERAJ, UASKHY/RRSAY, MARI, ARD, AREY 


Table 2: Korean and Chinese translation of the five mood clusters 


In addition to the questions asking about music mood, the participants were asked to provide basic 
demographic information such as gender and age. We also asked the following two questions to determine 
their overall familiarity with the songs: 1) if the participants had previously heard the song, and 2) if they 
can name the artist and the song title. The two questions together can help gauge listeners’ level of 
familiarity with the songs in a more objective manner than listeners’ self-perception. 


3.3 Data Analysis Methods 

There were two main methods used for data analysis. First, Chi-square (X) independence tests (Sokal & 
Michener, 1958) were applied to examine whether the distribution of mood judgments across mood clusters 
was independent from variables such as cultural background, gender, and age. The results can tell us 
whether each of these variables is related to mood judgments on a broad level. Chi-square distances were 
also calculated to further quantify the difference between two histograms (distribution of judgments across 
the clusters). Second, mood judgments of listeners from the same or different cultural backgrounds on the 
same clips were paired up to calculate the agreement ratio. This captures each listener’s judgments at the 
finest granularity and reveals the levels of agreement reached by groups of listeners in a variety of conditions. 


4 Results and Discussion 


4.1 Characteristics of Participants 
A total of 31 complete responses were collected from each of the three cultural groups (93 participants in 
total). Table 3 shows the demographics of the participants. 


Cultural Age Gender 


Background Min Max Avg. Male Female 
American (AM) 22 55 31.8 6 25 
Korean (KR) 23 39 30.7 14 17 
Chinese (CN) 19 46 26.2 10 21 


Table 3: Demographics of survey participants 


Approximately half of the Korean and Chinese participants (13 each) responded that they stayed in the 
U.S. for less than a year on the date of the survey. Nine Korean and 13 Chinese participants specified 1-5 
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years, and nine Korean and five Chinese participants specified 6-15 years. The listeners’ familiarity with 
the songs were categorized as follows: high familiarity means that the listener answered yes to both 
familiarity questions (i-e., whether they have heard the song before and whether they can identify the artist 
name and the song title), medium familiarity indicates yes to the first question, and low familiarity indicates 
no to both questions. Table 4 shows the distribution of familiarity levels across the three cultural groups. 


Low Medium High No Answer Total 
AM 617 120 192 1 930 
KR 832 63 33 2 930 
CN 836 75 14 5 930 


Table 4: Distribution of familiarity levels across three cultural groups 


Overall, American listeners were much more familiar with the songs in our dataset. Korean and Chinese 
listeners indicated that they are not familiar with the song (i-e., no to both questions) for 89.5% and 89.9% 
of all responses which is significantly higher than 66.3% of American listeners. This is not surprising as the 
30 clips were from the Western culture. Moreover, it is noteworthy that the Korean listeners indicated high 
familiarity twice as often as Chinese listeners did. This is in accordance with the fact that Korean people 
have been listening to Western music for a much longer period than Chinese. 


4.2 Mood Judgments 


4.2.1 Cultural Backgrounds vs. Mood Judgments 


Figure 1 shows the distributions of mood judgments from the three groups of listeners across the five mood 
clusters. The proportions of songs selected for Cluster_2 (rollicking, cheerful, fun, sweet, amiable/good 
natured) for American and Korean listeners were approximately the same and much higher than Chinese. 
The proportions for Cluster_3 (literate, poignant, wistful, bittersweet, autumnal, brooding) for Chinese and 
Korean listeners were similar, and higher than Americans. Koreans were much less likely to select Cluster_5 
(aggressive, fiery, tense/anzious, intense, volatile, visceral) than the other two groups, and were in between 
of the other two groups in terms of labeling Cluster_1 (passionate, rousing, confident, boisterous, rowdy) 
and “Other.” Koreans were also more likely to select Cluster 4 (humorous, silly, campy, quirky, whimsical, 
witty, wry) than the other two groups. 


300 
MAM EKR ECN 


250 
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150 


Ci ez Es c4 c5 Other 
Figure 1: Mood judgments across mood clusters 
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The fact that American listeners tend to select “Other” more often than Korean or Chinese listeners may 
be due to the fact that they are more familiar with the Western songs and the cultural context in which 
the songs are produced; thus, their perception of the mood of a particular song maybe influenced by more 
than the intrinsic properties of the song. It may also reflect the differing cultural characteristics of the 
listeners, i.e., collectivist Eastern listeners and individualistic Western listeners as discussed in previous 
cross-cultural music mood research (e.g., Boer & Fischer, 2010; Hu & Lee, 2012). American listeners 
disagreed with the presented mood clusters approximately twice as often as Korean and three times as often 
as Chinese listeners. 

A Chi-square test on the distributions shows that cultural background and mood judgment are not 
independent (x? =96.72, df= 5, p<0.0001). In other words, there is a relationship between cultural 
background and mood judgments. Pair-wised tests (on each pair of two cultural backgrounds) gave the 
same results. To quantify the differences across cultures, Chi-square distances between each pair of mood 
judgment distribution (histograms) were calculated and compared. The distance between American and 
Chinese listeners’ judgments was the highest (d = 0.51), followed by American and Korean listeners (d = 
0.02), and Chinese and Korean (d = 0.01). 

Overall, Chinese and Korean listeners perceived music mood more similarly, compared to American 
listeners. In fact, between Korean and Chinese listeners, mood judgments were significantly different for 
only two out of 30 songs (x? = 15.21 ~ 24.50, df = 5, p < 0.01) -- both were Dance music without lyrics. 
American and Korean listeners significantly disagreed on nine out of 30 songs (x? = 9.68 ~ 26.66, df = 4 ~ 
5, p < 0.05) while American and Chinese listeners disagreed on 14 of the 30 songs (x? = 7.90 ~ 23.83, df = 
3 ~ 5, p < 0.05). The song with the highest disagreement between American and Korean listeners was Got 
to get you into my life by The Beatles, consistent with the result between American and Chinese listeners 
as reported in Hu and Lee (2012). Figure 2 shows the mood judgments on this song across the three cultural 
groups. Most of the American listeners selected Cluster_2 (rollicking, cheerful, fun, sweet, amiable/good 
natured) or Cluster_1 (passionate, rousing, confident, boisterous, rowdy) for this song whereas almost one- 
third of Korean and Chinese listeners selected Cluster_4 (humorous, silly, campy, quirky, whimsical, witty, 


wry). 
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Figure 2: The mood judgments on Got to get you into my life by The Beatles across the three cultural 
groups 
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Based on the answers given with regard to the listeners’ familiarity with the song, 29 out of 31 American 
listeners indicated that they had listened to this song and 19 were familiar enough with this song that they 
could identify the artist name and the song title. However, only four out of 31 Korean listeners had heard 
the song before and two could identify it, and none of the Chinese listeners had even listened to this song 
prior to this survey. This difference in the level of familiarity as well as the fact that Americans are much 
more likely to have background information on the Beatles seem to have affected how listeners determined 
the mood of this song. 


4.2.2 Agreement between Pairs of Mood Judgments 

Each music clip in this survey had 31 answers from each cultural group. These answers were paired up, 
either within the same cultural group or across different cultural groups. Agreement ratios can be calculated 
by dividing the number of pairs with agreed judgments by the number of total pairs (Table 5). Korean 
listeners showed the lowest intra-culture agreement ratios of the three groups, indicating more diversified 
opinions were given by Koreans. In general, cross-cultural agreement levels were lower than intra-cultural 


one, except for pairs between Korean and Chinese listeners. 


American Korean Chinese 
American 0.35 0.30 0.30 
Korean 0.32 0.32 
Chinese 0.35 


Table 5: Agreement ratios among intra- and cross-cultural pairs of judgments 


4.2.3 Mood Clusters vs. Cultural Groups 


Table 6 shows the number of agreement pairs across the mood clusters. It shows that different cultural 
groups tended to agree more on different mood clusters: Americans agreed more on Cluster_2 (rollicking, 
cheerful, fun, sweet, amiable/good natured) and Cluster_5 (aggressive, fiery, tense/anxious, intense, 
volatile, visceral), Koreans on Cluster_2, and Chinese on Cluster_1 (passionate, rousing, confident, 
boisterous, rowdy). Based on the intra-cultural agreement counts, Chi-square distances were calculated to 
compare each of the cultural group pairs. The results again shows the biggest difference between American 
and Chinese listeners (d = 0.5), followed by American and Korean (d = 0.05), and Korean and Chinese (d 
= 0.03). 


C_1 C_2 C 3 C 4 C 5 Other Total 
4912 
AM 706 1477 778 587 1094 270 
(35%) 
4528 
KR 1101 1332 1011 566 461 57 
(32%) 
4901 
CN 1355 995 1203 443 894 11 a1 
(35%) 


Table 6: Number of agreed pairs across mood clusters among listeners within each cultural group 


4.3 Music Characteristics 


4.3.1 Instrumental vs. Vocal 


Half of the music clips in our dataset were instrumental, and the other half were vocal. Comparing the 
agreement ratios on these two types of songs can help reveal the extent to which lyrics can affect the chance 
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of agreement within and across cultures. Table 7 shows the comparison of intra- and cross-cultural 
agreement ratios for instrumental and vocal music. The lyrics of the vocal songs were in English. The fact 
that American listeners reached a much higher agreement ratio on vocal songs indicates that lyrics might 
have affected how listeners made the mood judgments. This is consistent with the findings in previous 
research by Lee et al. (2012) that lyrics do affect how users determine the music mood. 

For the other two groups whose native language is not English, the agreement ratios on vocal music 
were approximately the same. As the participants were recruited over university mailing lists, we assume 
most of the Korean and Chinese listeners had similar levels of English fluency. However, future research is 
necessary to determine whether and to what extent listeners actually interpret the meaning of lyrics while 
listening to songs with lyrics in their second languages. In any case, vocal music seems to have helped 
Korean listeners reach a higher level of agreement compared to instrumental music, but not for Chinese 
listeners. Cross-cultural agreement ratios on instrumental music were lower than intra-cultural agreement 
ratios, but cross-cultural agreement ratios on vocal music were the same as intra-cultural agreement, except 
for the pair of AM-KR. This suggests that Korean and Chinese listeners may appreciate the mood of 
Western vocal music in similar ways, but further research is warranted to confirm this hypothesis. 


Instrumental Vocal All 

American 0.28 0.41 0.35 
Korean 0.31 0.34 0.32 
Chinese 0.36 0.35 0.35 
Across AM-KR 0.26 0.34 0.30 
Across AM-CN 0.25 0.35 0.30 
Across KR-CN 0.31 0.34 0.32 


Table 7: Intra- and cross-cultural agreement ratios on instrumental and vocal music 


4.3.2 Genres 


Analyzing the music mood judgments of the three cultural groups with respect to music genre showed that 
culture groups did not have a significant relationship with mood judgments. However, looking at pairs of 
cultural groups reveals different patterns. When considering American and Chinese participants, mood 
judgments and cultural groups were not independent (i.e., related) across music clips in all genres (x? = 
21.91-46.68, df=5, p<0.001) (Hu & Lee, 2012). For American and Korean participants, mood judgments 
and cultural groups were independent on Pop music (x? = 5.22, df = 5, p = 0.39) and Other music (x? = 
10.09, df = 5, p = 0.07). The Other genre included one song in each of the following genres: Folk, Metal, 
Reggae, Country, Oldies and Rap. This indicates that Korean and American listeners perceived the mood 
of Pop and Other music similarly. Even stronger similarity existed between Korean and Chinese listeners: 
their mood judgments and cultural background were independent on Easy-listening (x? = 8.85, df = 5, p = 
0.10), Rock (x? = 10.93, df = 5, p = 0.05 and Other (x? = 4.73, df = 5, p = 0.45) songs. In other words, 
they perceive the mood of these three genres in similar ways. Among all the genres, only Dance music 
always received significantly different mood judgments cross-culturally, no matter which two cultural groups 
were paired up. 

Table 8 shows the agreement ratios within and across cultures in each genre. The numbers in bold 
were the most agreed upon genres. Within cultures, American listeners agreed most on Pop songs, Koreans 
on Easy-listening, and Chinese on Other genres. Cross-cultural comparison shows that American listeners 
agreed with Korean and Chinese listeners more on Pop music, while Korean and Chinese listeners agreed 
more on Easy-listening. The agreement on Dance music was the lowest overall. Across all genres, cross- 
cultural agreements were generally lower than intra-cultural agreements. 
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Dance Easy-listening Pop Rock Other All 

AM 0.30 0.29 0.46 0.35 0.31 0.35 
KR 0.25 0.38 0.35 0.31 0.32 0.32 
CN 0.29 0.38 0.32 0.35 0.41 0.35 
AM-KR 0.24 0.28 0.36 0.29 0.30 0.30 
AM-CN 0.22 0.28 0.33 0.31 0.30 0.30 
KR-CN 0.21 0.38 0.34 0.35 0.35 0.32 


Table 8: Agreement ratios within and across cultures in each genre 


4.4 Listener Characteristics 
4.4.1 Gender vs. Mood Judgments 


We tested the independence between the listeners’ gender and distributions of mood judgments across five 
mood clusters. The results indicate that gender and mood judgments are not independent when aggregating 
judgments from all cultural backgrounds (¥ = 36.87, df = 5, p < 0.0001). They are also not independent 
when considering American listeners (¥ = 14.99, df = 5, p = 0.01), and Korean listeners ( = 14, 27.04 df 
= 5, p< 0.0001), respectively. This means that male and female listeners in these two cultural groups did 
judge the music mood differently. However, that was not the case for Chinese listeners: male and female 
listeners in fact showed similar patterns in mood judgments (¥ = 4.33, df = 5, p = 0.50). 


4.4.2 Age vs. Mood Judgments 


The age ranges of listeners in the three cultural groups differed but all groups included listeners from 22 to 
39 years old. We split this common age range into two groups, age 22 to 29, and age 30 to 39, in order to 
test whether mood judgments depend on age groups in the three cultures. For American listeners, the two 
age groups showed similar mood judgments (¥ = 9.20, df = 5, p = 0.10) while for Koreans and Chinese, 
the two age groups judged music mood differently (Koreans: ¥ = 46.89, df = 5, p < 0.0001; Chinese: ¥ = 
19.39, df =5, p= 0.002). This could partly be attributed to the fast changing social environments in Korea 
and China in the past decades. In both countries, people with age difference of about 10 years would have 
grown up in very different living conditions and social dynamics which certainly would have played a role 
in shaping a person’s musical tastes. This is not the case for people who grew up in a relatively stable 
environment in developed countries such as the United States. 


5 Conclusion and Future Work 


Our findings reveal that the perceptions of the three cultural groups do differ. Based on the cultural groups 
they belong to, listeners behaved differently in their selection of mood clusters as well as agreement ratio 
in each of the mood clusters. However, in many aspects, Korean listeners are situated between listeners 
from American and Chinese cultures, evidenced by the Chi-square distances calculated on overall mood 
judgment distribution and pair-wise agreement on mood clusters, agreement ratio comparison between vocal 
and instrumental music as well as agreement ratio comparison across different music genres. Notably, 
Korean listeners showed the lowest intra-culture agreement ratio of the three groups. 

Despite the similar level of English abilities of Korean and Chinese listeners as well as the level of 
familiarity to the songs in the dataset, they did perceive music moods differently in many cases. South 
Korea had stronger influences from Western culture over the past few decades than China. These different 
levels of exposure to Western music in the two countries may be one of the reasons why Korean and Chinese 
listeners perceived music mood differently, and why Koreans seemed to behave more similarly to Americans 
in much of the comparison. In addition, user characteristics affected how listeners in certain cultural groups 
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determine music mood: for example, gender affected American and Korean groups, and age affected Korean 
and Chinese groups. 

The three-way comparison among American, Korean, and Chinese music listeners revealed that 
there was in fact some difference between Korean and Chinese listeners although they were both representing 
non-Western user groups. This implies that future cross-cultural music mood studies involving real users 
should explore similarities and differences among multiple cultural groups, not just between Western and. 
non-Western user groups, in order to achieve a deeper understanding on cross-cultural music perception. 

In our future work, we plan to conduct user interviews in order to more thoroughly investigate how 
listeners from different cultures perceive music mood and triangulate the results obtained from this study. 
In addition to the survey data we obtained from this study, it will be insightful to have people describe the 
reasons for selecting particular music mood in their own words. We also learned that it is difficult to find 
Korean and Chinese mood terms that directly map to certain English mood terms. A number of mood terms 
can be translated to different Korean and Chinese terms (e.g., rollicking, volatile). For some English mood 
terms, it can be challenging to find a single term that still preserves the original meaning (e.g., campy) 
when they are translated to Korean or Chinese. Therefore in our future work, we plan to test different 
translations of the mood terms to check the consistency of the results presented in this study. 
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Abstract 

Confronted by the challenges posed by the development of massive, open, online courses, design in 
information science research takes on a unique ontological character. Not simply a progression from 
human needs toward technological fulfillment, it comes to be understood as the eventful moment of the 
interplay of ethical decision and the material possibilities of technology. Conceptualized as such, design 
work presents an image of information science as progressive, deeply historical, and immanently 
concerned with the question of how to live. Starting from a consideration of the social-technical gap, the 
hermeneutic interplay of the distinct epistemological stances of ethics and technology is discussed, and 
an ontological understanding of design as centered on the logics of event and hospitality is introduced. 
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1 Introduction 


The arrival of massive open online courses (MOOCs) has brought back a key issue in the design of 
applications: the interplay between a single designer and a multiplicity of users. While many applications 
and systems today (ranging from Google, Facebook, to even something like Angry Birds) are designed by 
a single person or by a very tight and small group of people, they are nevertheless used by thousands of 
people of different ages and cultures. In the case of MOOCs, designers must be able to take the well-defined 
content of a course and be able to convey that to a massive, culturally-diverse, and largely unknown 
audience. In their design, MOOCs raise the question of how the specific intents and goals of a designer are 
able to match to those of a widening variety of users, each approaching the course from their own personal 
and cultural perspective, all within the field of technology. 

The modes of interaction found in MOOCs and other online systems provides an opportunity to 
improve on the understanding of the ideas behind approaches to user-centered design. At its best, user- 
centered design provides guidance for how to understand what needs to occur in design. As others have 
noted, as an idea, user-centered design can be strained by either too much and too specific of input from 
users (Norman, 2005) or by a lack of access to users by virtue of either distance (Blom, Chipchase, & 
Lehikoinen, 2005; Crabtree et al., 2006). By inviting a diversity of users all with unique cultural and 
individual needs, MOOC s strain existing understandings of the task of design. 

Central to these concerns is the assertion that social and cultural motivations are important for 
understanding use and that the work of design is itself embedded within larger socio-technical structures of 
traditions of use, notions of the possibilities offered by technology, and the historical context of design work 
(Irani, Vertesi, Dourish, Philip, & Grinter, 2010; Sengers et al., 2004). In all of this, there is a push toward 
understanding the work of design as contingent and specifically amethodological in its reliance on personal, 
interpretive, and sometimes arbitrary choices on the part of a designer (Neustaedter & Sengers, 2012). 


1.1 Online Education Design, Information Science Design 


The continuing development of the question of what online education design should be is reflected in the 
more general questioning over what information science is. Reflecting on the development of the definition 
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of information science, David Bawden (2008) noted that "[t]he cynic might say that we are still waiting for 
this, nearly 30 years later" (p. 416). This cynical waiting for definition is, in many ways, not just a symptom 
of the growing pains of a new discipline, but a necessary indeterminance found in a field which plays a 
largely synthetic role "concerned with the integration of the contributions of other sciences" (Meadow, 1979, 
p. 221). For us, what is important here is not any particular definition of information science (as Bawden 
(2008) shows, there have been many views that have disputed this particular formulation), but rather that 
information science, whatever it may or may not be, is at its core, an evolving and progressive discipline, 
one that is concerned with subjects that themselves are evolving and progressive. As Ronald Day (2001) 
demonstrates, the question of how to understand information (and information science) is one that deserves 
a historical account, as it is a progressive and changing thing whose identity is shaped by the aims and 
machinations of the powers of each particular era. The progressive and historical nature of understanding 
of the role of information is what, perhaps, gives information science a sense of being undetermined. 

For us, the question of how to understand design in information science is, above all, an ontological 
question, one that serves to orient our research in a philosophically ontological manner (Fonseca, 2007). We 
are concerned with questions of design in information science and online education, and with using the 
fundamental question of what design in information science is in order to gain insight into the pressing 
questions that surround the increasingly important role that design plays in our society (Norman, 2010). 

In building out an ontological understanding of design in information science, we will start from 
the gap between human needs and technological possibilities, radicalizing this ontic distinction through a 
discussion of Hans-Georg Gadamer’s concept of the situation. In turn, this hermeneutic understanding of 
the situation will be examined under the conditions of a process of design. Following this, a conception of 
design as an event of ethics and technology will be developed, and their interplay in the form of Derridian 
hospitality will be discussed. Finally, we will return to online education to demonstrate how this ontological 
view of design can be understood in context. 


2 Understanding Design in Information Science 


An often-cited (Carroll, 1997) starting point for understanding design in information science comes with 
the definition of design given by Herbert Simon (1969) as any course of action “aimed at changing existing 
situations into preferred ones” (p. 55). This definition is reinforced by Lois Lunin (2009) who in defining 
design says that it "can be defined as the combination of both the vision and the plans to realize the vision" 
(p. 1942). These definitions both highlight the double, quasi-dialectical ontology at work in information 
science. Lunin does this most overtly, with Simon's definition needing slightly more analysis in order to 
draw out the dual proposition of design. In both, there is a sense of the need for, as Lunin calls them, both 
vision and a plan to realize that vision: for a designer to have an idea and the technological means to make 
that idea real. 

This two step work of design, comprised of both an intent or idea and the means to effect that idea, 
comes to be the hallmark of information science, particularly given its progressive and evolutionary stance. 
As will be developed, these two impulses interact to form the ontological domain of design in information 


science. 


2.1 The Social-technical Gap 


Working in the area of computer-supported cooperative work (CSCW), Mark Ackerman provides a useful 
model for beginning to understand this dual nature of information science and information science design— 
albeit one that is not directed toward the present question. Looking at the "fundamental mismatch between 
what is required socially and what we can do technically" (p. 198), Ackerman (2000) diagnosed what he 
termed the "social-technical gap" looming between what is necessary and what is technologically possible. 
While human beings are nuanced, flexible, and often ambiguous, technological systems, if they are to 
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function, are not. In formulating this, Ackerman established a theoretical distinction in CSCW between 
what is technologically possible and what is humanly required. For Ackerman (2000), it is this social- 
technical gap that gives CSCW its theoretical motivations and "although it certainly can be better 
understood and perhaps approached" (p. 198), this gap is unlikely to be done away with entirely. 

Ackerman (2000) defends this understanding of the social-technical gap against two symmetrical 
charges that seek to diminish its importance. The first is that "the social-technical gap will be solved shortly 
by some new technology or software technique" (p. 189). This can be, quite obviously, considered to be the 
technological solution to the gap, while the second, that "the gap is merely a historical circumstance and 
that we will adapt to the gap in some form" (p. 189), is obviously aligned with a social or human 
amelioration. 

What is interesting about each of these critiques—which Ackerman answers easily and provides a 
convincing case for the permanence of a social-technical gap within CSCW—is that they are both couched 
as historical and progressive steps toward the elimination of the gap that exists in current human uses of 
technology. That is, they are concerned with socio-technical design at large, extending beyond the 
immediate conditions of use that Ackerman addresses. Within this historically-developing and progressive 
setting, there remains a naggingly-present mismatch between the subjective and responsive needs of users, 
and the more staid (though still changeable and changing) artifacts of technological design. Ackerman’s 
explication of such a gap highlights the continual mutual adjustment that goes on in fitting technology to 
a situation. 


2.2 The Situation of Design 


The model of the gap that exists between potentially unlimited needs and the finite conditions of 
technological possibility is a theme that can be seen more generally in the work of Hans-Georg Gadamer 
(2004) and his consideration of the hermeneutic nature of understanding. Central to this is a mode of fore- 
understanding necessitated by the hermeneutic circle, with understanding only being possible based on an 
already-existent (historical or traditional) understanding. For Gadamer, understanding is only ever 
developed on the basis of the particular historical situation of the interpreter. As he puts it, “[u]nderstanding 
is, essentially, a historically effected event” (p. 299). As he develops his hermeneutic view of historical 
understanding, Gadamer works against the the idea that it is possible to ever wholly take on someone else’s 
understanding, and asserts that any interpretation is founded on the particular situation under which it is 
enacted. 

This situational and temporal view of understanding can be applied directly to the socio-technical 
gap in several, complementary ways. Most directly, the gap can be seen as attempting to acquire an accurate 
picture of use so that design work may be properly addressed. 

In their Understanding Computers and Cognition, Winograd and Flores quote Gadamer on the 
difficulty of attempting to provide an accurate depiction of the situation of use: 


“To acquire an awareness of a situation is, however, always a task of particular difficulty. The very 
idea of a situation means that we are not standing outside it and hence are unable to have any 
objective knowledge of it. We are always within the situation and to throw light on it is a task that 
is never entirely completed.” (Gadamer, as quoted in (Winograd & Flores, 1986, p. 29)) 


While, according to Vera and Simon (1993), Winograd and Flores use Gadamer's assertion to place 


"particular emphasis on the difference between acting in ill-structured, real-world situations as 
compared with well-structured, defined situations, arguing that symbolic approaches, even if they 
take account of the bounds of human rationality, cannot handle ill-structured situations 
adequately," (p. 12) 
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a more fundamental issue beyond their concern for an ontic environment of action (and one more directly 
related to the practice of design itself) is at stake. 

In describing the concept of the situation, Gadamer provides an account of the epistemic difficulty 
presented that resonates with Ackerman’s gap between what we know we should provide and what we are 
able to provide. Saying that we "are unable to have any objective knowledge of [the situation]," Gadamer 
attends to a more formidable difficulty beyond concern for unstructured needs and structured solutions. 
Just as where in the socio-technical gap there is an inability to match human, subjective needs with the 
affordances of an objective system, it is the task of hermeneutics “to consider the tension that exists between 
the identity of the common object and the changing situation in which it must be understood” (Gadamer, 
2004, p. 308). In their hurry to offer means to close the socio-technical gap under the rubric of progress 
(either social or technological), Ackerman’s critics instead reinforce the gap in a more fundamental way: 
the constant shifting of needs and technological means over time simply turns an ontic consideration of the 
gap (that of the practical question of proper fit) into a more ingrained and ontological gap in understanding 
as found in Gadamer’s hermeneutics. 

Of course, for both Ackerman and Gadamer, the challenges posed by a separation between need 
and technology or between object and interpretation are not insurmountable, but are instead largely 
productive. For Ackerman (2000), the explication of such a gap is itself important in refocusing research 
toward “le]xploring, understanding, and hopefully ameliorating this socio-technical gap” (p. 179). For 
Gadamer (2004), the “true locus of hermeneutics is this in-between” (p. 295). Similarly, for each, the 
distance discovered does not close off possibilities for research, but instead opens them up. Gadamer (2004), 
in particular, in his consideration of the human sciences, sees the uncovering of this kind of limiting structure 


as immensely important and generative: 


“Every finite present has its limitations. We define the concept of 'situation' by saying that it 
represents a standpoint that limits the possibility of vision. Hence essential to the concept of 
situation is the concept of 'Horizon.' The horizon is the range of vision that includes everything 
that can be seen from a particular vantage point. Applying this to the thinking mind, we speak of 
narrowness of horizon, of the possible expansion of horizon, of the opening up of new horizons, and 
so forth.” (p. 301) 


This concept of the horizon is the locus of how Gadamer’s approach to the tension between two settings is 
productive, with the task of hermeneutics being found “in not covering up this tension by attempting a 
naive assimilation of the two by in continuously bringing it out” (Gadamer, 2004, p. 305). Putting this in 
terms of the socio-technical gap, the challenge is to not naively bring certain needs together with a solution, 
but to instead fuse the two perspectives by a hermeneutic dialogue between the different horizons of 
experience that they represent. “Projecting a historical horizon,” Gadamer (2004) reasons, 


“is only one phase in the process of understanding; it does not become solidified into the self- 
alienation of a past consciousness, but is overtaken by our own present horizon of understanding. 
In the process of understanding, a real fusing of horizons occurs—which means that as the historical 
horizon is projected, it is simultaneously superseded.” (p. 305-306) 


For Gadamer’s (2004) hermeneutic phenomenology, the interpretive processes that mirror and give shape 
to this developing link between social needs and technological solutions necessarily have a grounding in the 


common basis for interpretation that he poses: 


“When our historical consciousness transposes itself in historical horizons, this does not entail 
passing into alien worlds unconnected in any way with our own; instead, they together constitute 


the one great horizon that moves from within and that, beyond the frontiers of the present, 
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embraces the historical depths of our self-consciousness. Everything contained in historical 
consciousness is in fact embraced by a single historical horizon.” (p. 303) 


Such an approach is closely tied to Gadamer’s insistence that the weight of traditional understandings bears 
heavily on any present interpretation, and that there is, despite whatever difference may be felt at any 
particular moment, one common core to any sense of “truth” can be achieved through philosophical 
hermeneutic reflection (Bernstein, 1982). That is, our needs and technological offerings are linked by a 


common tradition of use and design. 


3 The Ontology of Design in Information Science 


While maintaining an a priori separation between human needs and the technological answers to those 
needs is a useful approach to understanding the ontic theoretical problem-space of information science, such 
an approach proves difficult in providing an ontological account of the work of design. While Ackerman's 
social-technical gap provides a useful orientation for CSCW and gives ontic import to the theoretical work 
in the field, its stark division between technology and human needs is not ontologically viable for 
understanding the role of design in information science. Which is not to say that that is its purpose. 
Ackerman's gap describes the immediate space of what is needed out of technology—for technology to 
satisfy needs as a singular theoretical whole—but does not set the question of the match between technology 
and need within a more fundamental framing of the ontological relationship between human beings and 
technology that extends beyond the moment of use and to the moment of design. As does Gadamer’s 
hermeneutic approach, the model of the socio-technical gap struggles to approach the activity of design as 
it is spread across the situations founded by use and the situations founded by the technological artifacts 
involved. While each point to, neither provides an ontological account of the enaction of the possibilities 
offered. 

Still, an approach which isolates human need from technological capability under the banner of the 
general requirement of satisfaction does provide a useful beginning to understanding an ontology of design 
in information science. Following such an approach, the component parts of design—technological means 


and intent—will be examined individually. 


3.1 Technological Possibility 


Herbert Simon provides a good place to start in order to understand the role of technological possibility in 
design, particularly in that his conception of design is almost purely technical. While including the idea 
that the goal of design is to “make artifacts that have desired properties” (Simon, 1969, p. 129), he excludes 
this sense of imperative from his formal description of design, making it wholly about the question of 
optimization toward a goal. In judging success, the benchmark is always, "does the system designed create 
the desired change?" The technological branch of information science design is one of effectiveness that is 
couched in the ability to set up systems that do what one wants them to do. In online education, for 
example, the question of the technical ability to communicate a message from one place to another (where 
the question is of whether the message arrived or not) is wholly different from the question of what the 
intent or motivation, in terms of educational goals, seeks to accomplish. The former would be a question of 
technological possibility, while the latter is of ethical intention. 

In its basic appeal to what is technologically feasible, this sense of possibility has an almost universal 
and positivist character. If such a technological system is able to accomplish what it does for one person, it 
will do it for another. So, in the example of online education, what can be understood as being 
technologically universally possible is the fact that a message can be communicated using online tools. This 
says nothing about whether it will accomplish any particular educational aims. This mode of technological 
possibility exists independently of any cultural value or intent. In many ways, such an account begins to 
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reinforce Ackerman’s social-technical gap, showing an unbreachable divide between human needs and 
technology, albeit under different terms and conditions, and with different implications. 

Gadamer (2004) too distinguishes between this kind of “technical” application from other types of 
application, saying that 


“filt is not only that moral knowledge has no merely particular end but pertains to right living in 
general, whereas all technical knowledge is particular and serves particular ends. Nor is it the case 
simply that moral knowledge must take over where technical knowledge would be desirable but is 
unavailable. Certainly if technical knowledge were available, it would always make it unnecessary 
to deliberate with oneself about the subject. Where there is a techne, we must learn it and then we 
are able to find the right means. We see that moral knowledge, however, always requires this kind 
of self-deliberation.” (p. 318) 


The limits of technical knowledge, of the object of technology, then comes to be understood in contrast with 
moral knowledge, which “can never be knowable in advance” (Gadamer, 2004, p. 318) and “has to respond 
to the demands of the situation of the moment” (Gadamer, 2004, p. 319). What Simon's (1969) 

design leaves out when describing design work as being concerned "with devising artifacts to attain goals" 
(p. 133) is an explanation of how to determine "how things ought to be" (p. 133). That is, for online 
education, there is, beyond any technical accomplishment, a fundamental question concerning the 
educational purposes and modes that should be instantiated in any technological design. 


3.2 Ethics and the Intention of Design 


As the development of a concern for culture and human perspectives within information science has shown 
(for example, (Ehn, 1988; Kling, 2007; Suchman, 2006)), the question of what should be done technologically 
under any particular circumstances is an important one. In many ways, the basic question of information 
science once the question of technological efficacy is momentarily suspended comes very close to Aristotle's 
(2004) original question concerning ethics: of what to do in order to live a good life. In framing his ethics, 
Aristotle was not concerned with a basic question of whether or not any discrete action is ethical or not, 
but rather with what should be done in order to achieve a good life. At issue here is a consideration of what 
types of activities one should invest their time in and how we should judge the outcome of any effect of our 
efforts. Looking beyond Simon's explication of design, when viewed from the position of an Aristotelian 
framing of human action, there is a distinctly ethical component to information science design. It is only 
once our goals and values have been examined and we have decided what should be done that we are able 
to design technological systems to achieve those things. 

Diverging from the picture of technological possibility as described in the previous section, this 
ethical question of what to do is not universally answerable. What is good for one person in one moment 
may not be good for another. More than just appealing to a sense of individual or cultural determination, 
this heterogeneity of ethical intent and desire is one that is situationally and historically derived. At its 
center, this kind of ethical variety is consequential particularly in the way in which it is subject to Gadamer’s 
concept of the situation of interpretation: that the circumstances and terms of any ethical consideration are 
always only able to be approached from an insular and situational perspective. 

While ethics in design has been discussed in many ways (Brey, 2000; Floridi, 1999; Friedman, Kahn, 
& Borning, 2006; Winner, 1980), we are not concerned here with the possible representation of any particular 
ethical system, but instead with a general inducement toward action that an Aristotelian consideration of 
ethics brings. What is of interest here is the way in which technological possibility interacts with the basic 
question of “how to live” and thus contribute to an ontology of information technology design. Neither the 
question of ethical intent nor the question of technology, however, is limited to such a singular consideration, 
and each (when starting from such an a priori distinction) needs to be subject to a double consideration: 
first in their initial formulation (as technology and as intent) and then again when brought together in the 
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activity of design itself. That is, when understood in an ontological fashion from an initial divide between 
intent and technology, each aspect (the moral force of the technological action and the materials involved) 


must necessarily be considered twice. 


4 Ontological Doubling of Design 


What we see in the ontological constitution of the field of design in information science is a progressive 
interaction between these two distinct impulses: the technological and the ethical. In their inextricable 
connection and following on the theme developed by Gadamer, the two take on the character of the 
hermeneutic interplay between figure and ground (Martin & Fonseca, 2010). In traditional forms of 
hermeneutic textual interpretation, the meaning of a particular passage is interpreted based on a reading 
of the whole of the text (Grondin, 1994). The whole of the text (the ground) invests the particular portion 
(the figure) with its meaning and vice versa. Here, on the one hand, technological possibilities offer a field 
on which we are able to articulate the figure of our ethical ambitions, and on the other, our ethical ambitions 
serve as the field against which we derive technological innovations. 

Central here is that the ethical goals set in the process of design and current technological 
capabilities are each, and in their own ways, determinate of the ontological field of design in information 
science. Each are progressive and evolving, and following the figure of the hermeneutic circle, change over 
time in each instance of design. In this, the kind of ontological understanding that is developed in this 
hermeneutic process “proves to be an event” (Gadamer, 2004, p. 308). 


4.1 The Event of Design 


As has been described above, as a progressive and ever-changing field, the work of information science 
design comes to rely on the logic of the event in order to provide an ontological account of its development. 
Coming out of various veins of post-structuralist philosophy (Badiou, 2007; Deleuze, 1990; Derrida, 1995), 
the logic of the event focuses on the absolute uniqueness of certain sets of occurrences. In the present use 
of the term, "event" explicitly means that which is not typical or universal and finds some lineage with 
Heidegger's Augenblick in which "[t]he singularity and uniqueness of the moment is a crisis calling for an 
individuating decision and resoluteness in response to the situation" (Nelson, 2007, p. 103), as well as with 
Gadamer’s (2004) consideration of the phronetic instance of legal judgment in which “every law is 
necessarily in tension with concrete action” (p. 316). That is, such a moment cannot rely on general 
prescriptions for action and instead pushes them away. It is a "dynamic and unstable moment" which 


"destabilizes pre-existing concepts and habits, even while it evades and resists normalization and 
being subsumed under categories, classes, and universals." (Nelson, 2007, p. 103) 


In this, design, as it is comprised of the moment of ethical decision against a backdrop of technical possibility 
(and vice versa), takes on a unique ontological character. It is not simply the progression from human needs 
toward technological fulfillment, but it is a unique and eventful moment in history which comes to be in 
the interplay of our ethical decisions (in the Aristotelian sense) and the material possibilities of technology. 
It punctuates the otherwise constant progression of information science. 

This event, as such, revokes previous considerations and presents a new historical situation (in the 
Gadamerian sense) to the designer involved. This new situation is unique from anything previous, and 
accounts for the mode of innovation or newness that is found in design (Bødker, 1998). In Derridian (1995) 
terms, this new situation, as the eventful confluence of ethical intent and technological possibility, becomes 


a kind of mysterium tremendum in that 


“le]ven if one thinks one knows what is going to happen, the new instant of that happening remains 
untouched, still unaccessible, in fact unlivable.” (p. 54) 
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Above and beyond the mode of hermeneutic interaction of intent and technology that is seen in producing 
the event of design, there is a further step required, one that is necessary in order to frame the work of 
design in a temporal and evolving setting such as presented by information science in general. 


4.2 Integration and Hospitality 


In looking at the eventful interaction of the dual impulses of design, it is useful to consider a conceptual 
position introduced to information science by Claudio Ciborra (1999, 2004) as a way to re-orient the 
ontological understanding of information system design and organizational integration (Brigham & Introna, 
2006). Also focusing on a mode of design work, Ciborra approaches design from a less immediate position 
than here, and attends to wider, more systematic concerns. Nevertheless, examining the relationship 
between technological artifacts and organizations, Ciborra uses Derrida's (2000) concept of hospitality to 
re-figure the relationship between existent organizational practices and practices that are introduced by a 
new technology. The concept of hospitality, for Derrida, relies on a radical acceptance and openness to the 
coming of a stranger. The stranger, in being appropriately welcomed, is treated as equal of the host, given 
the same rights and opportunities as the host, all the while still remaining only a guest: “The guest becomes 
the host's host. The guest becomes the host of the host” (Derrida & Dufourmantelle, 2000, p. 125). 

For Ciborra, while this logic of hospitality provides an insight into how information system design 
should approach the integration of a new technological system into an existent social one, what is important 
for us is the ontological picture of design that it provides. In looking at the event of design as occurring 
with the interplay of technological possibility and human ethical intent, it is possible to first see the way in 
which each of these discrete impulses welcomes the other, while still each remaining distinct. Just as it 
would be impossible to imagine any form of information technology (as material artifact) to exist without 
some motivating human intent, neither would it be possible for human intention toward information 
(whether considered traditionally technological or not) to exist without the object on which it can project 
that intention (Day, 2011). In both cases, the one opens itself completely to the other. 

While this largely follows the already-discerned hermeneutic structure of the interaction of figure 
and ground, there is one distinct difference between the kind of interaction that is present in Gadamerian 
hermeneutics and the picture of hospitality drawn out by Derrida. Whereas the case of hermeneutics is 
predicated on the necessity of some pre-given tradition on which to build an interpretation (as in the case 
of the interpretation of the law by a judge), Derrida’s (2000) 


“unconditional law of hospitality, if such a thing is thinkable, would then be a law without 
imperative, without order and without duty.” (p. 83) 


In our scenario, such unconditional logic is what allows for any kind of newness or innovation in the work 
of design to appear. 

More importantly than just providing an alternate and more immediately progressive picture of the 
interaction of intention and technology found in hermeneutics, the mode of disjunction seen in the concept 
of hospitality gives shape to the nature of the event of the interaction between the two as well as the kind 
of innovative and progressive newness that design brings. As Derrida (2000) describes it, 


“absolute hospitality requires that I open up my home and that I give not only to the foreigner . . 
. but to the absolute, unknown, anonymous other." (p. 25) 


That is, there comes to be a decisive acceptance of the result of the conjunction of the initial event of design; 
when confronted with the unexpected and heretofore unknown conjunction of the ethical intent of design 
work and the technological materials of it, there is an ontological necessity that such an event be welcomed, 
even as it may be unknown. 

In this ontological picture of design in which technological possibility comes together with ethical 
intent in an event in which each opens itself to the hospitality of the other, design achieves a fully historical 


277 


iConference 2014 Michael Marcinkowski & Frederico Fonseca 


and situational character. The decisions of design concerning this mode of hospitality become, for designers, 
truly ethical decisions: “ethics is hospitality” (Derrida & Dufourmantelle, 2000, p. 17). 

In Derrida’s (1995) account of the moment of ethical decision, the ethical decision is that which 
cannot be planned out in advance. If one were able to decide before the occurrence of an ethical decision 
what the correct decision would be, then such a decision would not in fact be an ethical one. It is this 
inability to prescribe the outcome of any ethical decision that leads design as whole toward a logic of the 
event and hospitality. In this temporal contingency, in which designers are faced both with judgments of 
ethical intents previous to the event of design, and in the event of design itself when such ethics comes into 
relation with technological possibility, design takes on the character of a doubly ethical moment. Design 
becomes strung between these two moments of decision. 


5 The Case of the Event of Online Education 


As has been sketched out, an ontology design of online education can be conceived of as an eventful 
interaction of intention and technological possibility. There are both moral aims in education, as well as 
technological concerns that, more in line with Simon’s more engineering-centric picture of design, serve to 
provide a distinct and portable formulation of how to achieve some goal. The technological tools that make 
such things as long distance and distributed communication possible are a kind of accomplishment that can 
be considered in a way wholly-distinct from educational intents. 

This independence comes to an end with the event of design in which the ideals of education are 
expressed in technological terms, or, conversely, when technological tools are given purpose in an educational 
context. While an initially hermeneutic rendering of this points to the revelation of a unified horizon which 
supports the two distinct positions, the event of design introduces such radical alterity that it must be 
confronted in a mode of hospitality. That is, from the perspective of design work, the event of design does 
not reveal anything about the world, rather, it asks of how this new design may be welcomed into the world. 
This can be seen in the case of MOOCs: rather than providing insight into the existing conditions of 
education, their design has challenged present understandings of what education can be going forward 
(Russell et al., 2013). 

In the ontology of online education design, there is not simply a gap between technological 
capabilities and the needs of users. Instead, both the aims of education and of technology are understood 
to be developing, with the event of design bringing these two distinct epistemological framings together into 
a coherent formation. For designers, this places their work within a specifically historical moment, one 
which not only provides the ethos of education and the technological tools, but also the surrounding 
situation into which their work will be welcomed. 


6 Conclusion 


The question of design in information science is one which presses the progressive and historically-situated 
character of both information science and design. Rather than being focused on a staid socio-technical gap 
that provides a model for the field, an ontological understanding of design relies on a dynamic account of 
the interaction that occurs within the space of the gap, and presents a series of ethical questions to designers 
concerning the purposes of their work, and how that work may be integrated into a larger field of socio- 
technical activities. MOOCs, with their field-defining potential, particularly serve to provoke ontological 
questions concerning the nature of design in information science. 

In laying out an ontological and ethical understanding of design which relies on the concepts of 
event and hospitality, our work points to future questions concerning the determination of intention in 
design work, how design decisions may be better understood in large-scale design situations, and how the 
technological and ethical possibilities come to be established. As the rise of online education demonstrates, 
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fundamental aspects of human activity have become intertwined with questions of technology, and there 


remains much to do in understanding the complexity of their relationship. 
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Abstract 

Purpose. Effective Research Data Management (RDM) has become an increasing concern in UK 
universities as a result of being mandated by research funders. The study uncovered how librarians, IT 
staff and research administrators viewed support of RDM and how they thought roles would be 
distributed amongst them. It used Abbott’s theory of the professions as a way of conceptualising the 
underlying dynamics. 

Methodology: Data was collected through 20 semi-structured interviews with staff in the Library, IT 
Services and Research Office of a research intensive university of middling size in Northern England. 
Findings: The different professional services viewed RDM differently. Broadly speaking, IT focussed on 
short term data storage; the research office on compliance and research quality; librarians on preservation 
and advocacy. The Library was the only department claiming a new jurisdiction in RDM. The other 
departments claimed to be short of resources to take on such a complex project. Some interviewees feared 
RDM might be a “poisoned chalice”. 

Research implications: Abbott’s (1988) concept of jurisdiction is a useful lens on how RDM services are 
emerging. 

Originality /value: The paper offers an early perspective on how support of RDM is being developed, from 
a theory of the professions perspective. 
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1 Introduction 


Increasing recognition of the value, volume, velocity, variety and vulnerability of research data has led 
funders in the UK to mandate better research data management (RDM) (RCUK 2011; Pryor 2012). Many 
people believe open data is key to research quality and scientific progress (Royal Society 2012), this also 
implies the need for better management of data. Research funding applications now require data 
management plans. A critical event in raising the RDM agenda in the UK was Engineering and Physical 
Sciences Research Council (EPSRC) asking all UK Higher Education Institutions (HEIs) to formulate a 
roadmap outlining how they would comply with the new RDM requirements by May 2012 and fully comply 
by May 2015. Critically, the funders place the responsibility for RDM on researchers and their institutions 
(Jones et al. 2013). Indeed, evidence, such as recent surveys (e.g. Cox and Pinfield, 2013; Corrall et al., 
2013), suggest that in the UK academic libraries are taking on or planning a range of roles in RDM, as part 
of a wider movement to offer more support to research in general (Auckland 2012). Roles have been 
identified in the areas of policy; advice and signposting; training; auditing of research assets and creation 
of institutional data repositories (Corrall,2012; Cox et al.,2012; Lyon,2012; Lewis,2010; Gabridge,2009). This 
work could be spread across a number of library teams: e.g. the liaison team, metadata specialists, special 
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Yet it is also clear that a number of other professional services will be involved in supporting RDM: 
particularly research administrators and computing services — as well as involving researchers themselves 
(Jones et al.,2013; Hodson and Jones,2013). Little has been written about the differing responses among 
professional services to the new RDM agenda and how such professional services will work together. This 
paper reports a study that begins to address this gap in the literature through an interview based study of 
support professionals at one institution. 

The paper is laid out as follows. It begins by discussing the theoretical framework for the study, 
Abbott’s theory of professions. It then considers what we know about the professional communities of 
librarians, research administrators and computing services. The methodology of the study is introduced. 
Findings from thematic analysis of the interviews are laid out and then discussed in relation to the literature. 


2 Theoretical framework 


Abbott’s (1988) theory of professions explores their development through struggles for jurisdiction. Abbott 
himself has written specifically about the information professions (1988; 1998) and others have used his 
theories, especially to examine librarianship’s relationship to IT (Cox and Corrall, 2013; Ray, 2001; Danner, 
1998; Van House and Sutton, 1996). According to Abbott, professions are in constant competition with one 
another because the environment in which they operate is continuously changing, e.g. due to social-cultural 
and technological change. Abbott's system of professions is “a world of pushing and shoving, of contests 
won and lost” (Abbott,1998,p.433). In essence, the theory states that professions seek to claim exclusivity 
over certain areas of work, for what Abbott labels “jurisdiction”. Claims for jurisdiction can be made in 
three different ways: 


1. through acquisition of power to license and regulate those who may perform the area of work by 
means of a professional organization, 

2. through creating a public image that associates the profession with that area of work, 

3. and through direct competition with other occupations and professions in the workplace. 


Professions cannot occupy a jurisdiction “without either finding it vacant or fighting for it” (1998,p.86): if 
there is a vacant jurisdiction — such as RDM - this will be a trigger for events in which adjacent professions 
dispute each other’s jurisdiction. Such disputes can be resolved in a number of ways. For example, they can 
lead to either full jurisdiction for one profession, or to the subordination of a number of professions to 
another one. The dispute could also result in a standoff that leads to a more or less equal division of the 
jurisdiction into interdependent parts. Abbott calls this a division of labour or a divided jurisdiction. 

Thus Abbott’s theory places the project of professionalisation centre stage and follows the story of 
professions battling for territory and trying to create a strong sub-culture, knowledge base, ethical code and 
a degree of autonomy. For the individual, professions are relatively stable structures within which to build 
an identity and career. Yet from a managerialist perspective the autonomy, credentialisation and boundaries 
implied by professionalisation are costs that weigh against the benefits of professional expertise. This is 
perhaps under-theorised in Abbott, because his focus is on professions. For professional based services 
embedded in organisations this is an important context. 


3 Professional services and their relations: libraries and librarians 


Librarians belong to an established profession with a long tradition. Although such bodies also exist for 
research administration and IT services, they are far less well established and less authoritative. The LIS 
profession has sometimes been studied from the viewpoint of its competition with neighbouring professions 
and occupations, most notably Information Technologists. Partly using Abbott as their theoretical 
framework, Van House and Sutton (1996) argue that librarianship is under threat from other professions 
and academic disciplines: “LIS risks being outnumbered, outmanoeuvred, and rendered marginal” (p.145). 
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Most notably, this threat comes from Information Technology, digital information (both digitized and born- 
digital), and the Internet. It is argued that this has led to “a reinvention of the access role”, the core 
jurisdiction of librarianship for Abbott, potentially in competition with IT professionals in which the Library 
has taken on the management of electronic content (Cox and Corall,2013). The inclusion of IT-related tasks 
in the LIS profession’s jurisdiction, is combined with the advent of Information Literacy (IL) as a 
preoccupation and the librarian’s increasing educational/teaching role (O’Connor,2008,2009). 

It may be that a similar response to the still increasing threat to the library’s traditional access role 
is happening within the RDM agenda (Cox and Pinfield,2013). RDM could be seen as an extension of the 
growing part academic libraries are playing in institutional repository management- another attempt “to 


expand the profession’s access jurisdiction into new areas” (Cox and Corrall,2013,p.12). 


4 Research offices and research administrators 


Research administration plays “an important part in formulating, developing, supporting, monitoring, 
evaluating and promoting” university research (Hockey and Allen-Collinson, 2009, p. 142). Originally, the 
function of research administration belonged to the task set of academic staff. Macfarlane (2011) discusses 
how “all-round” academic practice has been unbundled and some specialist functions such as research 
administration have become the domain of what he calls the “para-academic”. This trend has been 
stimulated by a more managerialist approach to university governance since the 1980s, and subsequently 
by a specialization of administrative support functions. This was caused in particular by increasing 
administrative and regulatory demands on universities from government. The growth of specialist research 
administration has also arisen from competition for externally funded research. 

Unlike librarianship, the occupation of research administrators does not appear to have many traits 
of a profession. Green and Langley (2009) report that there is a lack of accredited professional training, 
appropriate and nationally recognized qualifications, and clear career progression in the field. Although 
there is a professional body, called Association for Research Managers and Administrators (ARMA), it only 
has around 1,900 members (ARMA,2013). Indeed, Green and Langley’s (2009,p.17) survey showed “an 
embryonic profession struggling to create an identity”. They found that many research administrators did 
not feel well understood by either academics or their colleagues from the other support services. 


5 IT services and IT professionals 


Whereas research administration is a small professional group specific to academia, the IT profession is a 
large occupation spread over many sectors of work and little of the literature on it is specific to the HE 
context. Although an economically and culturally significant occupation, it is not organised in professional 
terms like librarianship. Professional bodies have sought to credentialise skills in IT, but the speed of change 
in IT has prevented them achieving occupational closure (Danner 1998). One strand of studies of IT has 
applied Trice's (1993) theoretical framework to IT professionals (Guzman and Stanton,2009; Guzman et 
al.,2008). These studies have found that IT professionals have a distinctive occupational subculture. A 


number of studies have compared librarians with IT professionals. The literature on convergence of library 
and IT services usually acknowledges the cultural differences between the two (e.g. Joint,2011). Creth (1993) 
argues that these differences stem from their education: librarians share “a process of acculturation” through 
their dedicated and accredited university courses in librarianship, but IT professionals do not have a shared 
socialization process and therefore no “shared professional history and values”. 

Creth (1993) reports a list of conflicting and shared values between the two professions that was 
compiled by participants to one of her workshops. The list contrasts the technical orientation of IT 
professionals with the service orientation of librarians, and IT professionals’ entrepreneurial behaviour and 
librarians’ need for consensus. They have a professional orientation in common, and a concern for the well- 
being of their institution. Favini (1997), in contrast, argues that the two have little in common, apart from 
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the fact that they both use technology to support the university's academic mission; as a result areas of 
overlap have been formed. But the professions have different tasks that require different skill sets and 
attract different kinds of people with different personalities. 


6 Research questions 


The literature does a little to establish the character of each professional group and something about how 
their relationships might shape the response to new agendas, such as RDM. It offers a conceptual framework 
for studying this in the system of professions. The emergence of RDM as an area of possible new joint 
activity is an opportunity to examine the nature of relationships between professional groups within 


universities. The research questions addressed in this research were: 


1. How do different professional groups see RDM? 
2. How do they think RDM support roles may be distributed between them? 
3. Do the concept of jurisdiction enrich our understanding of how RDM is received as an agenda? 


7 Methodology 


The research adopted an interpretivist methodology; the purpose was to understand how social actors 
themselves saw RDM. Data was collected through semi-structured interviews with professional services staff 
in one HEI in northern England. The institution is a research intensive university of average size with 
separate departments for library and IT services (not a converged service) and with a centralized research 
office, henceforth referred to as Library, IT Services, and Research Office. Cox and Pinfield (2013) found 
that most HEIs are still in the early stages with regards to planning and implementing an RDM support 
service and that libraries are usually taking on a leadership role. In that light, the HEI in this study could 
be seen as having many typical features. By the time the interviews were undertaken in the period between 
February and April 2013, an RDM service had not yet been set up. Meanwhile it had become clear that 
the Library would play a leading role. 

A series of 20 semi-structured one-to-one interviews lasting between 45 and 90 minutes each were 
conducted. University of Sheffield ethics procedures were followed to gain voluntary informed consent from 
participants. The purpose of the interviews was to gather insight into: 


e the professional identity of the interviewees, including their relationships with academics and other 
support services, 

e their views on research, and specifically on RDM, including drivers and barriers, 

e their views on the relationships with other professional services with regards to setting up and 


running an RDM infrastructure. 


The approach to sampling interviewees was non-probabilistic but purposive seeking to represent a good 
spread of job roles. For each of the services, both managers and non-managers were interviewed; the sample 
was also deliberately chosen to display a spread over different relevant units within the departments. It 
comprised both income capture officers and those involved in research governance (good research practice) 
in the Research Office (four interviews), managers, subject liaison librarians, metadata specialists and 
systems librarians in the Library (eleven interviews), and those involved in infrastructure (hardware) and 
applications (software), information security and records management in IT Services (five interviews). The 
emphasis lies on the Library because of its leading role in this university’s RDM activities. The interviews 
were recorded, transcribed and then analysed using thematic analysis (Braun and Clarke 2008). Through 
careful reading and re-reading of the transcripts, a framework or “matrix [...| for ordering and synthesising 
data” was developed in an Excel spreadsheet and applied to the data (Bryman,2012; Ritchie et 
al.,2003,p.219). The data set contained over 170,000 words. 
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8 Findings: Professional views of RDM 


When participants were asked to define RDM particular topics seemed to be associated with particular 
professional stakeholders. Thus the storage of active data was largely a concern of IT professionals. They 
viewed RDM as predominantly (but not solely) a storage issue from a systems engineering perspective. The 
emphasis lay on short-term storage of “active” data. One of the participants explained that in his experience, 
academics are always concerned about storage for their operational data rather than about issues involving 
metadata and data sharing. He argued that long-term storage and data sharing are what most people may 
think of as RDM, but that it is only part of the story and possibly not even the most pressing one: 


Longer term, there’s the whole archival retrieval area and kind of support things, like open data 
access, which is often what people think data management is: it’s about the archival bit and it’s 
about linking data to research outputs, which is one aspect of it. But for many people the things 
they struggle with is actually: how do I deal with the stuff now? What is good practice? 


Some of the specialists interviewed such as the expert in high performance computing, the information 
security expert and the records manager (who was located under the computing service umbrella) naturally 
saw RDM through the lens of their specialism. The records manager felt strongly that RDM was very much 
in the domain of his professional expertise. Yet his limited resources made it hard to forward a claim to a 
lead role. 

Those working in the Research Office defined RDM mostly as the long term storage of non-active 
data, and the sharing of these data. One of the participants argued this was the whole point of RDM: 


As an institution we'll create a lot data and information from academic research, and it’s how we 
collate, store, and communicate that to other people either internally or externally. So it’s all very 
well spending a lot of time doing a piece of research and creating a lot of useful information if 


nobody else ever knows about it. 


Yet participants from the Research Office also emphasised the limitations to open data, such as ethical and 
legal obligations in relation to the Data Protection Act, and contractual obligations. Such concerns were 
only mentioned in passing by only a very limited number of participants from the other service departments. 
The most important drivers for interviewees from the Research Office were attractiveness of the University’s 
research for research funders (which includes compliance to their requirements), and the quality of the 
research. 

Librarians were more varied in their responses than the other stakeholder groups. The Open Access 
officer defined RDM as an extension of her role, and followed the IT professionals’ division of RDM into 
active and non-active data. Both she and the metadata specialist specifically highlighted the open data 
aspect of RDM, and emphasized the role of metadata in data sharing: 


How are you going to make your data useable by other people who don’t have your background? 
So that has a lot to do with descriptions of the data, the[...] metadata. 


By contrast, Library managers defined RDM as a challenge. One of them saw the challenge not in storage 
— “I don’t perceive storage of data to be difficult, or indeed expensive in this day and age” — but in advocacy: 


I think part of the challenge is in the advocacy, and I don’t just mean the skilling-up of library, 
information and computing people to deal with the situation, but advocacy as far as the academics 


are concerned. 


As regards drivers, some referred to the library’s traditional role of providing access to information, to the 
open access agenda, and to the Library’s educational role: 
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If you’re going to have a more open approach to research data, you need to organize it. It needs to 
be described properly. [...] It’s about [...] having it organized enough so that people can find it, use 
it, evaluate it, reuse it, etc. And that’s absolutely central to a librarian’s role. |...| But also the 
training, the fact that we’re good at signposting, providing guidance/training in handling 
information. That’s what we do. 


The liaison librarians focused more on what mattered directly to their roles. They highlighted queries from 
academics as their most important driver, but at present they were getting hardly any. 

In summary, IT services saw RDM as about data storage, especially active data storage and 
information security. Research administrators tended to see it as about data sharing, driven by research 
quality and compliance to funders’ requirements. Librarians saw it as about data storage and preservation, 


but also advocacy and training. 


9 Findings: Distribution of RDM roles between services 


When IT staff were asked to define their role in RDM, they highlighted first of all storage from both an 
infrastructure (hardware) and an application (software) point of view, and secondly guidance, training and 
support as the areas they were likely to get involved in. One of the managers described their involvement 
as “providing the bedrock either directly or indirectly”. He saw the management of active data as “likely 
to be a discussion between [IT Services] and the researchers themselves, to really tease out what their needs 
are.” However, the management of non-active data was seen as a collaborative effort with the Library, 
where the Library would take control of the “management of long term repositories”. For him, a research 
data repository would be “just another system”. Advice, guidance and training was another service that IT 
professionals felt responsible for, although they described it as a shared responsibility, especially with the 
Library. 

The two participants from the Research Office’s income capture team saw their involvement in 
RDM as limited. They did see signposting and advice on Data Management Plans as belonging to their 
remit, although perhaps not something they yet had expertise in. However, they thought RDM would not 
impact on their role in any major way because they work “pre-award”. They felt the research governance 
team would be more involved in RDM, because they operate “post-award”. But the participant from that 
team saw her involvement in a similar way to the income capture officers: providing guidance, support, and 
awareness-raising. 
Especially the preservation of data was identified as an area where at present only the Library had an 
interest. Providing guidance, training and support was identified as a role for the liaison librarians. 

When participants were asked to identify any areas of overlap or even conflict and competition 
between the professional services in RDM, not all were prepared to talk in terms of conflict and competition. 

Others, however, did see a competitive element: 


I can see that competition will come into it [...]. If it was me personally I’d say, “no”. But in reality 
I think, “yes”. There will be competition and that’s part of the problem. There is a bit of jostling 
for position over this. 


There were three main areas of overlap and contention that participants identified: systems specifications; 
training, advice, and guidance; and leadership. One of the Library managers mentioned storage as an area 
of overlapping roles. Both in the IT department and the Library there are “systems people” with expertise 
in “the technical infrastructure”. They may both be involved in defining the specifications for storage 
systems. This would also be true for the Research Office. In particular, they need to be involved in the 
specification of the metadata that needs to be collected from funded research projects. 
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Most participants thought there might be an overlap in the provision of training, advice and guidance. It 
was generally assumed that all three departments would be involved, but that there was a danger that the 
information they provided would be inconsistent: 


I mean we’ve got to be very careful that we don't have contradictory messages out there. We just 


need to make sure it’s the same message to everybody wherever it’s coming from. 
ybody 


Participants referred to a natural division of training roles between the departments, although they 
identified areas of overlap such as about practical data management, ethical considerations and data 
security. 

A Library manager suggested there might also be competition over the branding of the RDM 
support service: 


It will be over silly little things like where to host the web page, because that seems to matter: 
Whose brand is it going to be? It’s around the branding, I think, where the most competition will 
arise, because: which URL? 


Less an area of overlap and more an area of direct competition was the question of RDM leadership. One 
of the IT managers identified RDM as “quite a major area and it is quite high profile” which could be both 
an opportunity and yet also a poisoned chalice. It is an opportunity, he argued, because “there is a big 
demand out there for help” and “it is an important part of our role to actually provide that for people”. 
However, RDM could be seen as a hazardous area of for two reasons. First, at the most senior level of 
university management there were differences about whether the agenda should taken seriously, e.g. over 
whether funders would enforce compliance. Some research office staff seemed to share these concerns. 
Secondly, the selection of data to be preserved in the long term could be controversial: 


Those are the real challenges: if we’re going to keep things, it’s not so much even a question of 
where do we keep it, it’s: what do we actually keep, what’s of value? And the view in some quarters 
is that 99% of data is actually useless and you might as well just throw it away. So that’s the 
poisoned chalice bit, I think. 


The Library, by contrast, seemed willing to take the lead in RDM. One of the Library managers described 
RDM as an integral part of the profession in the future: 


Helping to curate research data management is going to be vital to the profession. 


She argued that RDM is vital to the profession because providing access to academic information is the 
Library’s main role, whether this information is bought in from publishers, or produced by the university’s 
own academics: 


We look after academic stuff. I know it could be a printed notebook or it could be a really complex 
experimental output, it could be raw data, it could be publications, all sorts of stuff. We’re in the 
business of looking after whatever this institution puts out into the world, and not just in the 
business of buying stuff in from elsewhere. 


10 Discussion 


Some participants were reluctant to talk about there being conflict over RDM. What the three professional 
services had in common was a shared commitment to organisational purposes and especially to service 
delivery. This was most clearly articulated by one of the IT professionals, who thought there could be no 
significant difference between the professional cultures because all were committed “to the provision of a 
service which they want to be high quality, and they want to make sure that the customers that use that 
service are satisfied.” Such discourses overlay any sense of an immediate jurisdictional dispute or clash of 
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professional cultures. However, the nature of the services and the customers they aim to satisfy are different, 
as are the relationships that the participants perceived to have with the other professional services. 

These differences in the nature of the services provided were reflected in the different views of RDM 
that the participants expressed and how they thought the tasks should be divided. Predictably, participants 
from IT Services defined RDM predominantly as the storage of active and non-active data. This was a 
distinction that was only very infrequently made by participants from the other services. Participants from 
the Library and the Research Office were not concerned with the short-term storage of active data, but 
with the long-term storage of non-active data and with data sharing. One of the IT professionals summed 
up the difference: the Library and the Research Office are interested in the end product, whereas the IT 
department focuses on the entire lifecycle, “right the way from the start or even the pre-start, because you 
have to fathom what it is you want to do and store and how you are going to do it, all the way through to 
the end product.” However, as far as the drivers are concerned, there were differences between the Research 
Office and the Library. Participants from the Research Office saw the attractiveness of the institution’s 
research to research funders and the associated issue of research quality as the main driver to engage in 
RDM, whilst not all librarians had a sense of an intrinsic driver and some of the non-managerial staff 
seemed reluctant to engage in RDM. However, especially managerial staff formulated engagement in RDM 
as an opportunity, suited to the Library’s traditional access role, its existing skills in information 
management, and its championship of open access and digital preservation. Importantly, they also 
considered it to be an integral part of the profession — something that none of the other interviewees 
commented on. 

The division of roles regarding the management of long-term non-active data (data selection and 
handover, data repositories, data catalogues) and guidance, training and support were unclear. Critically, 
librarians identified RDM as a likely integral part of librarianship, and they highlighted the alignment of 
RDM tasks with current Library expertise. This prompted them to claim a leadership role. It would appear 
that any conflict over professional jurisdiction in an Abbottonian sense, would most likely involve IT 
Services and the Library. An on-going jurisdictional conflict between these two professions is already known 
from the literature (e.g. Cox and Corrall,2013; Ray,2001; Danner,1998). In this particular case, the 
interviews suggested that the Library was indeed keen to extend its jurisdiction into RDM, more so than 
IT Services. IT professionals seemed to consider RDM from their usual perspective as deliverers of an 
infrastructure as a (paid for) service, and they did not appear to be enthusiastic to expand that role into 
the actual management of data. Indeed, the Library was already proactively taking the lead: they had 
designed the institution’s RDM policy, and were leading the institution’s efforts to implement an RDM 
service. As one of the IT managers said: “The library seem very keen to lead on it and I think the rest of 
us are quite happy to sit back and let them do it.” 

Through Abbott the driver to take on RDM could be interpreted to be the result of pressure on 
the longstanding access jurisdiction of librarians. From all participants, the Library emerged as more 
explicitly uncertain and concerned about its role in the institution and in RDM in particular, than the other 
professional departments. Ironically, librarians were also the least well informed about the nature of 
academic research. The number of staff with PhDs, for example, was significantly lower than in the Research 
Office and IT Services, both in the sample of this study and in the whole population. As a big professional 
group in most academic institutions the library has the resources to stake a claim for jurisdiction over RDM. 
Smaller groups such as the records manager or even research office are disadvantaged in this respect. IT 
were less keen to claim the area. Perhaps this was partly because the resourcing of the area was unclear. 
IT Services defined themselves with a slight feeling of ethnocentric (Trice 1993) superiority as the “funnel” 
through which all information has to pass in general, and as the “bedrock” of any RDM service in particular. 
This sense of strength could be seen to rest on IT Services themselves providing a service to the other 
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professional departments, and there are therefore relationships of interdependency with both the Library 
and the Research Office. Secure in this position the need to claim jurisdiction over RDM was less. 

The Library was not fighting for “full jurisdiction” over RDM; Abbott defines full jurisdiction as 
complete control over an area of work, subordinating the other professions involved. It would rather seem 
that the parties are working towards a “divided jurisdiction”: a situation where the dispute ends in a more 
or less equal division of labour between interdependent parts. This would in many respects be expected to 
reflect a “natural” division of labour established in other areas where work was divided between the different 
services. Yet the evidence showed that there was scope for conflict between the professional services, such 
as those resulting from varying priorities in interdependent relationships, and possibly even some form of 
competition over issues such as the branding of the service, but that on a higher managerial level the 
consideration of the benefit to the organization might very well prevail over professional dispute. 


11 Conclusion 


Adopting the lens of Abbott’s theory is a useful way of looking at RDM. Through his theory, RDM may be 
considered an arena where various professions meet and vie for jurisdiction over a newly emerged area of 
work. However, of all stakeholders involved, the Library was the only professional department trying to 
claim a new jurisdiction in RDM. The Library’s proactive steps into this area reflect an already long- 
standing movement within the profession to extend its jurisdiction into a more IT-based direction, into 
training and tentatively also into research support, e.g. through open access for research publications. The 
interviews support this interpretation: they show that the Library sees its involvement mainly as a provider 
of access to research data via a repository, and as a provider of training, guidance and support to the 
research community. RDM can therefore be seen as a new area of work for the Library in the form of an 
extension of areas of work it has recently tried to claim. Although this involvement in RDM may represent 
a claim to a new jurisdiction, there was no evidence from the interviews that this resulted in a full-blown 
Abbottonian struggle between competing professions. The departments in this case study were happy with 
the Library’s lead; they claimed to be short of resources to take on such a complex project, and some feared 
RDM may be a “poisoned chalice”. It would therefore seem that the Library’s willingness to enter a new 
area of work (seen as an opportunity), combined with the relative reluctance of other stakeholders to lead 
on RDM, and a shared concern for the common good of the organization, does not result in an Abbottonian 
struggle over work. Abbott’s theory focuses on the relations of professions; for profession based services in 
organisations the tension between the profession’s interests and the good of the organisation is a key context. 

The research presented in this paper was an investigation of RDM provision in a single research 
intensive university. This institution had a centralized research support office but not a converged 
IT/library service. Other institutions will have different constellations of service and different existing 
relationships between the professional services prior to the emergence of the RDM agenda. The size of the 
institution and its balance of research activity are also important. The way forces work themselves out 
would clearly be somewhat different in a non-research focussed institution. It would also be very significant 
what the authority structures of the institution were like. A more managerialist environment would mean 
that professional autonomy and conflict would be much more likely to be curbed. Further, how the forces 
identified here will play out as actual services are created remains to be seen. Different pictures may 
therefore emerge. In some institutions the Library will not take the lead on RDM_It is a truism among those 
working in the field of RDM that every institution is different about how it approaches RDM, because of 
the complexity of the issues. Precisely in order to explore this complexity more in-depth studies are required. 
The paper does demonstrate, however, that an approach informed by Abbott’s theory of the professions is 
a useful perspective. It is plausible because it chimes with what we already know about support services as 
professional communities. Although the relations between libraries and IT services have been of interest, at 
least to scholars and practitioners from librarianship, relatively little research has been published on their 
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relationships, on the internal organisation of computing services in academia, and little that connects to 
wider research on university administration (e.g. Whitchurch 2012). Further consideration needs to be given 
to theoretical alternatives to Abbott, such as Whitchurch’s (2012) concept of Third Space. These are 
promising lines of research inquiry for broadening our perspectives on library work: to understand how the 
profession develops as shaped by its relations with other professions, and in relation to organisational needs 
and purposes. RDM itself is a fascinating locus of change, as part of a seeming return for libraries (and also 
IT services) to support of research one that could lead to significant reconfigurations of professional services, 


e.g. in terms of skillsets required, interactions and styles of activity. 
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Abstract 

“Open” approaches have the potential to advance significantly the mission of higher education and 
research institutions worldwide, but the multiplicity of initiatives raises questions about their coherence 
and points to the need for a more coordinated approach to policy development. Drawing on the European 
e-InfraNet project, we adopt a broad definition of Open, including activity alongside content, and identify 
the different Open domains, their salient characteristics and relationships. We propose a high-level 
typology and model of Open to inform policy design and delivery, and employ Willinsky’s framework for 
open source and open access to discuss the theoretical underpinnings of openness, finding important 
commonalities among the domains, which suggests that the framework can extend to all the Open areas. 
We then examine potential shared benefits of Open approaches, which reinforce the argument for a 
unified policy agenda. We conclude with some observations on limits of openness, and implications for 
policy. 
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1 Introduction 


The “Open” agenda is assuming growing importance in the higher education and research (HER) community 
worldwide. Approaches such as providing open access (OA) to research publications, sharing open data, 
releasing open educational resources (OER), and developing open-source software (OSS) are becoming 
widespread. There are both bottom-up pressures, from researchers, librarians, educationalists, and 
technologists creating open systems and making content openly available; and top-down forces, with 
policymakers and research funders encouraging or even mandating open approaches (Andersen, 2010; Kelly, 
Wilson, and Metcalfe, 2007; Kuchma, 2008; Pinfield, 2012; Schuwer & Mulder, 2009; Stokker, 2011). 
Openness is “a trend, both in terms of the production and sharing of educational materials, as well 
as making research publications (and even research data) freely available” (Conole & Alevizou, 2010, p. 
42). Described as “two broad movements” (open research and open education), digital scholarship is 
crucially influenced by the convergence of social constructivism and Web 2.0 technologies in the late 20th 
and early 21st centuries (Esposito, 2013), but has historical roots as far back as the “open science” ethos of 
the late 16th and early 17th centuries (David, 2004). The “Open” agenda is accordingly extending apace 
into all areas and activities of the academy, impacting its core missions of teaching and research, as well as 
the systems and processes that are critical to individual and institutional success. It is becoming a guiding 
principle of HER in the modern world. 
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However, as observed by the educational philosopher, Michael Peters (2010, p. 80), openness is “a 
complex code word for a variety of digital trends and movements...based on the growing and overlapping 
complexities of open source, open access, open archiving, open publishing and open science” and has “deeper 
registers that refer more widely to government (‘open government’), society (‘open society’), economy (‘open 
economy’) and even psychology (openness as one of the five traits of personality theory).” The multifarious 
open movements are at different stages of evolution and maturity, and the nature of the concept and the 
culture of the academy and its hinterland mean that there are continuing debates and disputes around what 
openness means in particular domains irrespective of their age and development. 

Moreover, despite obvious connections between different open activities at a conceptual level, these 
initiatives have typically been pursued within specialist communities without coordination. The related 
policy and practitioner literature is similarly disparate, although it shows increasing convergence between 
OER and OSS in the learning and teaching context (Andersen, 2010; Christiansen & Anderson, 2004; Conole 
& Alevizou, 2010; liyoshi & Kumar, 2008; Leeson & Mason, 2007; Wiley & Gurrell, 2009), and between 
OSS and open science in the research arena (Lyon, 2009; Rhoten & Powell, 2007; Royal Society, 2012; 
Schroeder, 2007; Whyte & Pryor, 2011; Willinsky, 2005). Some commentators have used the Boyer (1997, 
pp. 24-25) model of scholarship lately to advance a more integrated view of openness, showing how digital 
practices are transforming all four of his categories of academic work: research/discovery, 
synthesis/integration, practice/application, and teaching (Garnett & Ecclesfield, 2011; Katz, 2010; Pearce, 
Weller, Scanlon, & Ashleigh, 2010). Recently, Wellen (2013) has examined the commonalities between OA 
to research outputs and MOOCs (Massive Open Online Courses) from a political economy perspective, 
arguing in particular that the “unbundling” of previously integrated processes and roles in their production 
create the conditions for “disruptive” change. 

Peters and Roberts (2012, p. 4) are notable examples of the few scholars to investigate at a deeper, 
philosophical level the historical and contemporary connections between the diverse concepts of openness, 
specifically: 


“the social processes and policies that foster openness as an overriding educational and scientific 
value, evidenced in the growth of open source, open access, open education, and their convergences 
that characterize global knowledge communities.” 


Another significant contribution, but at the policy level, is the work of the European Network for Co- 
ordination of Policies and Programmes on e-Infrastructure (e-InfraNet, 2013), which has scoped a broad 
policy framework for open approaches in HER in the context of European Union initiatives on innovation 
and digital agenda. Informed by European projects and developments, supplemented by evidence from 
global sources, e-InfraNet (2012, 2013) provides an overview and synthesis of different types of open activity 
and their relationships, and a compelling argument for openness as “the default modus operandi for research 
and higher education.” 

Within this context, our purpose here is to map out the current Open landscape from a policy 
development perspective, considering in particular the potential for greater coordination between different 
Open approaches. We first identify the main characteristics of the various Open domains, deploying a broad 
definition of “Open” to capture the present range of Open initiatives. We next advance and elaborate a 
high-level typology of Open to inform policy development, and discuss whether the different Open initiatives 
can be approached in a coordinated way as part of a single coherent policy agenda. We suggest that a 
framework put forward by Willinsky (2005) for understanding the convergence of open source, OA, and 
open science can extend to other Open domains. We then outline the potential shared benefits of the 
different Open approaches, which we argue strengthen the case for convergence, while also commenting on 
some limits of openness, and we conclude with our observations on the policy implications of our findings. 
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2 Definitions and Dimensions of Openness 


Defining “open” clearly and unambiguously is important from a policy-development as well as a conceptual 
perspective. Different interpretations of the concept can result in different outcomes in practice and 
protracted debates among stakeholders, notable instances being text mining of journal articles, where only 
some versions of OA permit harvesting and analysis of content, (Clark, 2013; Howard, 2012; McDonald & 
Kelly, 2012) and open standards in the IT industry, where there are competing visions of openness and how 
it applies to the products and processes of standards development (ANSI, 2005; Cerri & Fuggetta, 2007; 
Tiemann, 2005). Varying interpretations of what “open” means are especially common when the particular 
phenomenon is at an emergent stage, exemplified by the different approaches to open peer review reported 
in the literature (Ford, 2013; Shotton, 2012; Ware, 2011). 

The various arenas of open activity have generated a range of definitions. The open access (OA) 
movement, in particular, has several widely cited definitions of the basic concept and salient dimensions 
that are potentially applicable to other open areas. The seminal Budapest Open Access Initiative (BOAI, 
2002) limits its scope to peer-reviewed journal literature (including unreviewed pre-prints) and defines the 
concept thus: 


“ free availability on the public internet, permitting any users to read, download, copy, distribute, 
print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data 
to software, or use them for any other lawful purpose, without financial, legal, or technical barriers 
other than those inseparable from gaining access to the internet itself. The only constraint on 
reproduction and distribution, and the only role for copyright in this domain, should be to give 
authors control over the integrity of their work and the right to be properly acknowledged and 
cited.” 


This early characterization significantly incorporates the ability to both view content and reuse it in various 
ways. Suber (2012, pp. 65, 66) disambiguates these issues using terminology from the software community 
to define two “sub-species of OA”: 


“Gratis OA is free of charge... Users must still seek permission to exceed fair use. Gratis OA removes 
price barriers but not permission barriers.” 


“Libre OA is free of charge and also free of some copyright and licensing restrictions .. Libre OA 
removes price barriers and at least some permission barriers.” 


Libre OA has recently proved controversial, generating extensive policy-based debate. Some publishers allow 
free viewing of content, but not various kinds of reuse without permission, significantly limiting the practical 
benefits of OA. Text mining often involves copying, reformatting, and analyzing large corpora of textual 
material, which contravenes the licenses of many publishers, even if they allow some kind of Gratis openness. 
Policymakers thus cannot assume that requiring authors to make outputs Open will necessarily allow 
content to be mined (or be used in other ways); so, when formulating policy, they need to consider carefully 
the level of openness required on the Gratis-Libre spectrum to ensure the intended practical outcome. 

Taking the perspective of an educationalist looking at open educational resources (OER), Wiley 
(2010, p. 16) also emphasizes the importance of user permissions in relation to open content, describing “4 
Rs” of Open: 


e Reuse: the right to reuse the content in its unaltered/verbatim form (e.g., make a backup copy of 
the content) 

e Revise: the right to adapt, adjust, modify, or alter the content itself (e.g., translate the content into 
another language) 
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e Remix: the right to combine the original or revised content with other content to create something 
new (e.g., incorporate the content into a mashup) 

e =6Redistribute: the right to share copies of the original content, the revisions, or the remixes with 
others (e.g., give a copy of the content to a friend)” 


Wiley (2010) defines “reuse” narrowly, but his other “Rs” encompass a broad set of secondary-use activities, 
presented here as an essential feature of Open in the context of OER. Other formal definitions, including 
the Open Knowledge Foundation (OKF, 2011) Open Definition Project, also emphasize minimal restrictions 
on various sorts of reuse, reworking and redistribution. 

Such definitions assume openness applies specifically to content, rather than more generally — for 
example, to activity. In contrast, some definitions from the IT and software community emphasize process- 
based openness. For example, Weber (2004, p. 56) states, 


“The essence of open source is not the software. It is the process by which software is created. 
Think of the software itself as an artifact of the production process. And artifacts are often not the 
appropriate focus of a broader explanation.” 


Understanding the process of OSS creation, sometimes known as “open development” (Anderson, 2009) to 
distinguish it from OSS as product, is important in grasping the full potential of Open in different contexts. 
The discourse on open standards similarly emphasizes development in an open process (Ray, Gulla, Dash, 
& Gupta, 2011) and although definitions here exhibit varying levels of openness, they are typically multi- 


dimensional, for example: 


“Open standards are developed in a transparent and collaborative process, are available for free or 
at a nominal cost and can be implemented royalty free — in particular regarding software 
interoperability standards — or at reasonable cost.” (Undheim & Friedrich, 2008, p. 2) 


Table 1 illustrates the range of open phenomena found in academic discourse and practice, showing how 
the balance, granularity, and interplay of product and process are manifested in different domains. 


Concept Definition Source 
Open “systematic efforts to create and maintain stores of Openly Jones et al. 
bibliography accessible, machine-readable bibliographic data” (2011) 
Open content “ a collective name for creative work published under a non- Keller & 


Open “free and open digital publication of high quality college and OCW 
courseware university-level educational materials. ...organized as courses, and Consortium 
(OCW) often include course planning materials and evaluation tools as well [n.d.] 
as thematic content. ...openly licensed, accessible to anyone, 
anytime via the internet.” 
Open data “Data that meets the criteria of intelligent openness. Data must be Royal Society 
accessible, useable, assessable and intelligible.” (2012, p. 12) 
Open “the community-led development model found within many Anderson 
development successful free and open source software projects.” (2009) 
Open “collaborative practice in which resources are shared by making Ehlers (2011, p. 
educational them openly available, and pedagogical practices are employed 6) 


practices (OEP) 


restrictive licence that explicitly permits the work to be copied and 
— depending on the particular licence chosen — to also be adapted 
and distributed.” 


which rely on social interaction, knowledge creation, peer-learning, 
and shared learning practices.” 


Mossink (2008, 
p. 13) 
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Open 
educational 
resources (OER) 


Open innovation 


(OT) 


Open literature 
review [open 
research] 

Open notebook 
science 


Open peer 


review 


Open science 


Open source 


Open systems 


“teaching, learning and research materials in any medium, digital 
or otherwise, that reside in the public domain or have been released 
under an open license that permits no-cost access, use, adaptation 
and redistribution by others with no or limited restrictions.” 

“the use of purposive inflows and outflows of knowledge to 
accelerate internal innovation, and expand the markets for external 
use of innovation, respectively. ...assumes that firms can and should 
use external ideas as well as internal ideas, and internal and 
external paths to market” 

“uses a social networking space to aggregate and collectively 
discuss an evolving body of literature around a set of core research 
questions.” 

“a form of Open Science where the laboratory notebook is made 


public in as close to real time as possible” 


“the opposite of double blind, in which authors’ and reviewers’ 
identities are both known to each other (and sometimes publicly 
disclosed), but... also used to describe other approaches, such as 
where the reviewers remain anonymous but their reports are 
published.” 

“making methodologies, data and results available on the Internet, 
through transparent working practices” 

“the practice that gives free access in production and 
development to the source material for an end product; in most 
cases, one is dealing with software.” 

“...conform to internationally agreed standards defining computing 
environments that allow users to develop, run and interconnect 
applications and the hardware they run on, from whatever source, 


without significant conversion costs” 


Table 1: Sample Definitions of Open Concepts 


UNESCO 
(2012, p. 1) 


Chesbrough 
(2006, p. 1) 


Conole & 
Alevizou (2010, 
p. 6) 

Bradley, 
Owens, & 
Williams 
(2008) 

Ware (2011, p. 
25) 


Lyon (2009, p. 
6) 

Keller & 
Mossink (2008, 
p. 9) 

Bryant (1995, 
p. 32) 


Open activities in the HER arena are evolving in a complex, pluralist context, where multiple definitions 
prevail with varying levels of consistency. Several scholars have identified synergies between the different 
open approaches, but much of the discussion and development of policy and practice has taken place in 
specialist communities of interest, proceeding along parallel tracks, rather than across domains, in a coherent 
effort. An important contribution here is the crafting and promotion by e-InfraNet (2013, p. 12) of a simple, 
overarching definition of Open, which builds on a definition promulgated by CETIS (the former JISC- 
funded Centre for Educational Technology and Interoperability Standards): 


“Open means ensuring that there is little or no barrier to access for anyone who can, or wants to, 


contribute to a particular development or use its output” 


Significantly, openness here not only covers use of content, but also includes “contribution” to an activity. 


The policy document explicates the key concepts: 
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e “little or no barrier to access” means there are little or no technological, organisational, 
financial, legal, or even cultural restrictions to access. It also implies that access remains possible 
over time. 

e “for anyone who can, or wants to” means whether (s)he is a regular participant in the research 
& higher education system or not, and whether (s)he actually contributes/(re)uses or not. 

e “contribute to a particular development or use its output” means that “little or no barrier 
to access” extends to “little or no barrier to participate in development and/or use the results of 
that development”. It requires that outputs are available in their entirety (full text, complete data, 
source code and so on), in formats that allow processing by humans and machines; that this remain 
the case over time; and that access, participation and (re)use can be immediate. It also requires 
that full documentation is available to enable understanding of what has been made open, to allow 
for appropriate (re)use” (e-InfraNet, 2013, p. 12). 


This is a pragmatic and wide-ranging definition, which intentionally creates opportunity for policy 
discussion and development. However, it immediately raises policy-based questions, particularly around 
levels of openness on the Gratis-Libre spectrum, already seen in relation to text mining, which may also 
apply more widely. It also raises a key question around the extent to which policymakers wish to take into 
account the development of meta-tools (including supporting documentation and metadata) that enable the 
reuse of Open materials, something which is inevitably resource-hungry. 


3 A Typology of Open 


Based on this broad definition, and building on the framework elaborated by e-InfraNet (2013), we propose 
a high-level typology, which divides the range of open approaches or domains identified in the literature 
and practice into three main types of openness: 


e Open Content 
e Open Process 
e Open Infrastructure. 


Table 2 presents our typology, which augments the domains covered by e-InfraNet (2013, p. 11) by adding 
open bibliography, open educational practices, and open systems. 


Open Type Open Domain 
Open Content Open access to research publications (OA) 
Open data 


Open educational resources (OER, including open courseware) 
Open bibliography (also known as open metadata) 
Open source software (OSS) 
Open Development Open development (also known as open development method, ODM) 
Open educational practices (OEP) 
Open peer review 
Open science/open research 
Open innovation 
Open Infrastructure Open standards 
Open systems 


Table 2: Open Typology 
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The key motivation associated with the Open Content domains is making content of various sorts freely 
accessible and available for reuse. Such content might include publications, theses and dissertations, 
datasets, learning objects, metadata, or computer code; Suber (2012, pp. 98-99) provides additional 
examples. The Open Process domains all aim to carry out academic or business processes in the public 
arena. Whilst they expose content, the primary purpose of this content is contributing to process, rather 
than being product. Open Infrastructure aims to produce an interoperable technical environment supporting 
the work of the HER community. 
Figure 1 provides a high-level model of the open types indicating how they relate to each other. 


A D 


Open 


Content 


Open Infrastructure 


Xe Open Culture 4 


Figure 1: High-Level Open Typology 


The typology as a whole has correspondences with parts of the P2P Foundation framework, although the 
latter is designed to inform a wider political and social agenda and takes a cross-sectoral perspective covering 
a large set of issues outside HER (e.g., Open Government and Open Business). The P2P framework includes 
(amongst others) the types “Products of Openness”, “Practices of Openness”, and “Infrastructures of 
Openness” (Good & Bauwens, 2010; Tkacz, 2012, pp. 395, 396), which broadly correspond to Open Content, 
Open Process, and Open Infrastructure as outlined in the HER context. 


4 Elaborating the Open Typology 


Relationships between the different Open types are a key conceptual and policy issue. e-InfraNet (2013) 
proposes a particular view of their development: 


“The availability of and access to e-infrastructures and content are necessary conditions for 
efficiency and effectiveness in modern research and higher education. For sustained and sustainable 
development and innovation — both within and outside research and higher education — open 
participatory and collaborative approaches are also required. As the availability of and access to 
content and infrastructural resources increases, the need for and use of ‘open processes’ becomes 
more evident. Where ‘open content’ is used and produced in ‘open processes’ within an open 
infrastructural setting, a culture of ‘openness’ gradually emerges” (e-InfraNet, 2013, p. 13). 


This hypothesis lays out a possible set of relationships between the Open types as they develop, and 
reinforces the case for Open becoming a coherent modus operandi for HER, as e-InfraNet (2013, p. 53) 
recommends. It takes as its model OSS-related development and products, where the open development 
process is instrumental in producing OSS content. Its advocates expect OEP and OER to have a similar 
process-content relationship in future (Ehlers, 2011); thus, following the huge success of OCW, MIT 
launched a new initiative to 


“share not just the content that MIT uses in teaching — the original OCW model — but also explicit 
information on how we teach at MIT ...pedagogical statements from and interviews with 
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participating faculty, links to exemplary teaching practices, showcases of educational innovations, 
and other framing information that places the content shared in context of our teaching 
philosophies” (Abelson, Miyagawa, & Yue, 2012, p. 9). 


Many MOOCs further develop the process-content relationship in the educational field by delivering 
openness not merely of educational content, and possibly its production, but also opening up the process of 
its consumption through the “connectivist” approach of online learner interaction during learning process 
(Cooper & Sahami, 2013; de Waard et al., 2011; Liyanagunawardena, Adams, & Williams, 2013). 

Such ways of working could increasingly become mainstream for a range of activities within the 
HER community, though this would involve major cultural change, as recognized by e-Infranet (2013, pp. 
12, 40) in explicitly presenting Open as “Content, Process, Infrastructure and Culture”, and emphasizing 
its “deep impacts” at different cultural levels: global, national, political, organizational, and personal. 

The relationship between Open movements and culture is complex. Open requires cultural change, 
but also is likely to generate change. While wide dissemination is currently assumed for research publications 
and standards, mainstream practices in other areas typically operate on quite different assumptions; for 
example, peer review is normally confidential. The relationships between different Open types and their 
constituent domains are also complex. While conceivable that OSS as a product could be developed in a 
closed environment, ODM is the naturally preferred development method for OSS, so there is an 
instrumental relationship between them. However, in other cases an instrumental relationship between Open 
Process and Open Content is less clear: open science as a process and production of OA content are not 
necessarily linked; OA does not necessitate open science, nor vice versa. Nevertheless, overall (cause and 
effect), this amounts to very significant levels of change, involving the development of a radically different 
set of cultural norms in HER. 

There are major challenges here for policymakers wishing to shape Open initiatives. First, to develop 
a rationale and priorities for investment, they must understand the importance and state of the different 
Open types and domains and their relationships. Whilst emphasizing the interconnectedness of the different 
Opens and the need for coordinated policy development, e-InfraNet (2013) recommends acceleration with 
Open Content and Open Infrastructure domains, which are arguably more mature than most Open Process 
domains, since barriers to wide implementation are likely to be lower; it suggests more experimentation is 
needed in most Open Process areas to inform further policy development. 

Secondly, to bring Open approaches into the mainstream, policymakers need to facilitate cultural 
change. Policymakers cannot themselves effect such change, but they can incentivize behaviors likely to 
encourage change in academic practices and culture, albeit gradually. One key aspect is how research and 
scholars are evaluated or assessed, which has traditionally concentrated on published papers in high-impact 
journals. Andersen (2010, p. 43) suggests that “participation in open digital activities...should count toward 
tenure and promotion”, and e-InfraNet (2013, p. 51) similarly argues 


“a broader set of criteria that focus on the contribution to the advancement of knowledge. Such a 
contribution can be made in many different ways: by publishing an article, but also by educating 
students, by communicating about research questions in forums and blogs, by making datasets 
available, by cooperating in ‘open’ projects to name but a few examples.” 


Using these insights into relationships between phenomena of interest in the open environment, we offer a 
provisional model of Open, which depicts the types of Open and their interactions in an evolving open 
culture. Figure 2 displays our relational model of openness, showing potent reciprocal influences of Open 
types and Open culture on each other in a context of policy stimulus and support. 
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Figure 2: An Evolving Model of Open 


5 Open Convergence and Coherence 


The different Open domains have developed through a wide range of different initiatives managed at various 
levels: by institutions, consortia, national agencies, foundations and international bodies. Institutions such 
as MIT have led the OER agenda, launching OCW in 2001 to make all its online learning materials freely 
available, and expanding this in 2012 into edX, a consortium of major US universities committed to OER; 
the related OCW Consortium began in the US, and now has members worldwide (Peters, 2010; Yuan & 
Powell, 2013). Developments in OER are mirrored in other Open domains. In some cases, national bodies, 
such as JISC (UK) and SURF (Netherlands), sponsored programs to promote Open approaches, particularly 
around OA, and more recently open data and OER (Procter, Halfpenny, & Voss, 2012; Tedd, 2009; van 
der Kuil & Feijen, 2004); e-InfraNet (2012) identified 48 examples in ten European countries. The European 
Commission has also sponsored European-wide programs, particularly linked to Open Content and Open 
Infrastructure developments, including DRIVER and DRIVER2, augmented by OpenAIRE, to encourage 
adoption of repository technology, and projects associated with the GEANT network promoting 
interoperability with national networks (Dijk & Van Meel, 2010; Lossau & Peters, 2008). International 
organizations, such as OKF, have also contributed to Open agenda; sometimes Open approaches in HER 
have been linked to wider political or social movements, such as the P2P Foundation campaigning for 
greater openness in various ways (Peters, 2010). 

Such initiatives have normally been pursued by different communities of practice, often with little 
or no explicit connection between them. For example, OA has been promoted by various stakeholders, 
including funders, librarians, and researchers in particular disciplines; whereas OER has typically been 
promoted by learning technologists and educationalists. The policy-based and practitioner literature on the 
different Open domains has seldom interacted in a meaningful way. Motivations for the different domains 
are typically articulated only in relation to their specific environments, amounting to 


“a patchwork development of multiple open approaches, in response to different drivers in different 
contexts, that vary in maturity; there is not yet an ‘Open’ Agenda as such” (e-InfraNet, 2013, p. 
7). 


The apparent lack of convergence and coordination raises critical questions: 


e Can the various Open domains form a single coherent policy agenda? 

e More fundamentally, can the different Open domains legitimately be considered a single set of 
interrelated developments; or are they essentially separate initiatives, without any meaningful 
connections? 
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Willinsky (2005) provides a framework for considering the coherence of the different Open domains. His 
focus is open source, open access, and open science, whose fragmentation he notes, while also observing a 
natural “convergence” of the different domains, albeit often “unacknowledged”, and largely “unrealized.” 


His nuanced argument works on three levels: 


1. The different Open domains have a shared “commitment” 
2. They are governed by a set of common “economic principles” 
3. The domains have shared characteristics (derived from 1 and 2). 


To which we add, 
4. The de facto interconnectedness between the Open domains is continuing to develop. 


The different Opens are founded on a shared “commitment to the unrestricted exchange of information and 
ideas” (Willinsky, 2005). This fundamental tenet of Open approaches creates an obvious, but nonetheless 
important, level of coherence across all Open domains; it allows academic inquiry and creativity to flourish, 
and is also fundamental to the functioning of democratic systems. This wider societal argument, often 
articulated around concepts of “transparency”, underpins many cases for greater openness, and is often 
deployed by policymakers, particularly in relation to Open Content. For example, current UK government 
support for OA and related strengthening of OA mandates by government research sponsors are often 
expressed in terms of transparency (Jha, 2011; RCUK, 2013). The “transparency” argument for Open 
Content can also be applied to Open Process: for example, open peer review has the potential advantage of 
making the quality control process at the center of scholarly communication more transparent. The 
immaturity of many Open Process domains (such as OEP and open science) means it is unclear exactly 
how this might play out. Nevertheless, the transparency argument remains important, and may itself be 
sufficient justification for a coordinated policy approach. 
The Open domains also share three broad “economic principles”, based on: 


1. the efficacy of free software and research; 
2. the reputation-building afforded by public access and patronage; and, 
3. the emergence of a free-or-subscribe access model (Willinsky, 2005). 


The first principle discusses the notion of “free” knowledge and resources, “free” here primarily referring to 
openness and allowing unrestricted reuse (e.g., Libre OA), revealing how information and knowledge 
resources are especially conducive to being managed as a “common-pool resource” (Hess & Ostrom, 2007), 
because they are nonsubtractive and hence nondepletable (Corrall, 2000). Indeed, information resources are 
structurally abundant, reflected in their tendency to generate more information (exemplified in HER by the 
cumulative nature of scholarly knowledge), and their characteristic of “gaining value when shared or 
(re)used” (Corrall, 2000, p. 189) is a powerful argument for sharing, particularly in the digital environment, 
where use of knowledge objects is also nonrivalrous (Hess & Ostrum, 2007; Wiley, 2010). 

The second principle describes the “economics of patronage”, drawing on David’s (1998, 2004) 
comparison of the “open science” movement of the 17th and 18th centuries, usually funded by wealthy 
patrons, with “today’s public patronage of research and scholarship.” Willinsky (2005) illustrates how such 
funding supports scholarly behaviors that promote openness, drawing detailed analogies between scholarly 


inquiry and OSS, demonstrating a convergence of characteristics: 


“Entire fields of inquiry emerge, as one article builds on another, sometimes by critique and 
refutation, and sometimes by replication and extension ... the research literature, as a whole, acting 
like an operating system that enables others to run new programs of research and to contribute, in 
turn, to the learning of others. Scholars carefully document their research methods, data sources, 


and references in ways that enable others to run the same experiment and consult the same resources 
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... The research article is part of a larger, very complex code on which other researchers build, 
debug, and extend, always with the intent of turning it back to the research community.” 


The argument here could apply more widely to other academic outputs, and possibly processes; research 
data and bibliographic information, for example, fit within this academic “operating system.” Willinsky 
(2005) also identifies similarities in the motivations of software engineers contributing code and researchers 
augmenting the literature, with intellectual curiosity and the creative impulse essentially driving both 
groups, along with the peer recognition that characterizes the academic value system of “cooperative 
rivalries” (David, 1998, p. 17), or “competitive sharing” (Pinfield, 2012, p. 53). Prior contributions to the 
shared knowledge base create competitive advantage in the HER “economy of recognition” that potentially 
extends to the other open domains. 

The third principle is the “free-or-subscribe model for accessing intellectual properties and public 
goods” that enables an “alternative economy” to coexist with commercial operations. Willinsky (2005) 
concentrates on entrepreneurs creating fee-based support services around free software, and does not cover 
the full spectrum of economic models in publishing, where OA and journal subscriptions can operate 
together or separately; nor does he discuss other value added services offered by publishers. The mixed 
economy principle is valid across the open domains, and open movements are already generating commercial 
opportunities, a notable example being the 2012 launch by Thomson Reuters of Data Citation Index, a 
priced product enabling discovery of (open) research datasets (Torres-Salinas, Martin-Martin, & Fuente- 
Gutiérrez, 2013). 

This framework deepens our understanding of the ethical commitment and economic principles 
shared by the different open domains, but also reveals and illuminates other common characteristics, 
particularly motivational drivers (intellectual curiosity, reputation building, competitive sharing), creating 
conditions for viewing the Opens as a single coherent phenomenon. 

While the strength of the links between domains varies, their evident connectedness supports the 
case for policy coordination, a case which is reinforced by explicit manifestations of interconnectedness 
across Open domains. For example, OA services, such as institutional repositories, commonly deploy OSS 
products, including D-Space and ePrints (Mittal & Mahesh, 2008; Pinfield et al., in press; Tedd, 2009). OA 
publication of research datasets alongside or embedded (interactively) in related journal articles, enabling 
validation of results (Rzepa, 2011; Shotton, 2012), is another manifestation of interconnectedness. 
Furthermore, for OERs and MOOCs to achieve their full potential they often require other complementary 
Opens, including open textbooks and research outputs. The Association of Research Libraries (ARL) 
therefore urges MOOC providers to “Set the Default to Open” for both course content and reading material 
(Butler, 2012, p. 14). e-InfraNet (2013, p. 48) articulates the interconnectedness of Open types as a general 
principle: 


“if content is open, the means with which to access and process it — manually and/or through 
machine processing — needs to be open as well.” 


DRIVER is a European example promoting open infrastructure, processes and content together at a 
practical level (Lossau & Peters, 2008). 


6 Benefits of Open 


Drawing on e-InfraNet (2013), Read (2011), and other sources, we find six significant potential benefits, 
shared by the open domains, which offer important advantages for inquiry, pedagogy, and society, and 
which support the case for a unified policy agenda. While the evidence base is incomplete (reflecting the 
immaturity of developments), we suggest the six dimensions of Open advantage serve as a framework for 
monitoring activity, recording progress, and reviewing policy. 


303 


iConference 2014 Sheila Corrall & Stephen Pinfield 


6.1 Visibility and impact 

A growing body of evidence shows that OA increases usage and creates “citation advantage” for researchers 
(Swan, 2010; Wagner, 2010; Xia & Nakanishi, 2012), though negative effects have also been reported for 
humanities scholars (Xu, Liu, & Fang, 2011). Davis (2011, p. 2133) argues that the biggest potential benefit 
is “outside the core research community” (to those who consume, but do not contribute to the literature), 
confirmed by the UK public and voluntary sectors (Beddoes, Brodie, Clark, & Hoong Sin, 2012; Look & 
Marsh, 2012). Studies covering Australia, Denmark, Germany, the Netherlands, UK and US have identified 
wider social and economic benefits (Houghton, 2006; 2009; Houghton & Sheehan, 2009; Houghton, Dugall, 
Bernius, & Krönung, 2012; Houghton, Rasmussen, & Sheehan, 2010), and individual case studies showing 
commercial impact also exist (KE, 2011). Institutions engaging with OER have similarly gained visibility 
and impact, notably MIT, whose material has attracted massive usage worldwide and reached learners in 
less developed countries (Atkins et al., 2007). The evidence for other domains is limited, although a citation 
advantage for papers linking to open data has been found (Dorch, 2012; Henneken & Accomazzi, 2012; 
Piwowar, Day, & Fridsma, 2007). 


6.2 Reuse 


The ability to reuse, reanalyze, recombine, and redistribute open material has transformed scientific 
practice, with retrievals of data from archives increasingly outnumbering data deposits; working on existing 
data is especially beneficial for large-scale and high-cost projects, such as the human genome and the Hubble 
telescope, or any endeavors where compiling data is labor-intensive (Ascoli, 2007; Beagrie, 2006; IHGSC, 
2004). Reusing large corpora of scholarly articles for text mining is established practice in the biomedical 
field (Zweigenbaum, Demner-Fushman, Yu, & Cohen, 2007), but evidence suggests such techniques are 
applicable across many more disciplines (Delen & Crossland, 2008), and there is “clear potential for 
significant productivity gains” and improved research quality in the HER sector, and also wider economic 
and societal benefits (McDonald & Kelly, 2012, p. 4). Despite the availability and recognized benefits of 
OER (e.g., quality enhancement, cost reduction), there is conflicting evidence on the level of reuse by 
teachers and learners in higher education practice (Hodgkinson-Williams, 2010; OPAL, 2011; White & 
Manton, 2011). 


6.3 Innovation and agility 


The removal of barriers to free flow of information enabled by Open Content and Open Infrastructure 
promotes innovation in HER and beyond. Evidence here is limited, but includes case studies, such as the 
ATLAS project at CERN, whose innovative use of social media is enabled by OA material (Doyle, 2011). 
The Open Educational Quality Initiative also found substantial evidence that “Using OER leads to 
institutional innovations” (OPAL, 2011, p. 69), and there are also examples where OSS has delivered timely 
software solutions, improving systems and processes (University of Oxford, 2010), and demonstrating 
institutional agility. 


6.4  Cost-effectiveness 


e-InfraNet (2003, p. 16) notes that openness enables “efficient use of expensive resources, shared approaches, 
reduce duplication of effort and can save time”, citing large-scale Open Infrastructure initiatives as a prime 
example. OER and OA can also improve the cost-effectiveness of teaching, learning, and research; for 
example, by using free reusable learning objects (RLOs) in course design (Christiansen & Anderson, 2004), 
adopting open textbooks (Bliss, Hilton, Wiley, & Thanos, 2013), and moving from subscription-based 
journals to Green or Gold OA literature (Jubb, Cook, Hulls, Jones, & Ware, 2011). 
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6.5 Quality enhancement 


Increasing visibility of content and inviting input from others creates a “virtuous circle ...improving quality 
of learning, research, software and administration” (e-Infranet, 2013, p. 14). One-third of MIT OCW faculty 
report the process “improves their course materials” (d’Oliveira & Lerman, 2009), while citizen science and 
volunteer computing are helping to solve problems previously beyond the reach of research teams (Lyon, 
2009; Royal Society, 2012). 


6.6 Reputation and trust 


Availability of Open Content and more open conduct of research can promote institutional expertise to 
industry and the media, enhance confidence in HER institutions as public bodies, and mitigate the risks of 
unmanaged exposure of data or other materials. The value of OER for marketing and branding in student 
recruitment is widely recognized (Yuan & Powell, 2013): one-third of incoming students cited MIT OCW 
as “a significant influence in their choice of school”; and one-third of participating faculty reported 
publication of their course materials had “improved their professional standing in their field” (d’Oliveira & 
Lerman, 2009). 


7 Limits of Open 


Understanding the limits of Open domains will challenge policymakers. Most Open approaches arguably 
have “natural” limits, which need to be identified and tested. For example, OA to research literature is 
typically defined in terms of peer-reviewed journals (BOAI, 2002), which are royalty free, unlike 
conventionally-published monographs (though there have been experiments with OA e-monographs, which 
again are characteristically royalty free). OA thus assumes authors are not paid directly for their work, so 
it is reasonable to define its natural limit as the royalty-free research literature, rather than the research 
literature as a whole, which then has implications for policy development in designing institutional or funder 
OA “mandates.” 

There are important reasonable limits to openness for research data, such as publishing findings 
before research data are shared, maintaining commercial confidentiality for industry sponsored research, 
and respecting the privacy and sensitivity of research subjects; research ethics committees/institutional 
review boards often restrict secondary use of data related to human participants — access to datasets may 
be limited to qualified researchers, or denied pending inspection for quality (Eschenfelder & Johnson, 2012; 
Smith, 2011). Data then may have to be anonymized before sharing and processed for re-use, limiting what 
can be shared and when. Policy developers need to establish clear criteria for selecting data to be shared, 
protocols for the timing of sharing, and enabling processes and systems. 

Selectivity is also likely to prevail in relation to OER. Unlike MIT, most institutions do not choose 
to share everything, to protect existing business models of fee-based courses. Policy therefore must focus on 
developing criteria for sharing. We need to recognize that Open resources will continue to exist in a mixed 
environment. In particular, while software produced within the community to support research and teaching 
could become open by default, HER institutions will likely continue to deploy both open-source and 
commercial solutions to support both academic and administrative functions. 

Notions of “selectivity” and “mixed economy” are controversial and may be used to perpetuate 
fundamentally non-Open approaches; for example, publisher embargoes on self-archiving research papers 
may delay OA beyond their useful life. UK research funders are challenging embargoes of more than six 
months for STEM disciplines or 12 months for arts, humanities and social sciences (RCUK, 2013), but their 
policy intervention has divided stakeholder opinion, with some research universities arguing for a much 
longer embargo for non-STEM subjects (1994 Group, 2013). 

Another limitation on achieving real openness is the extra effort required (actual or perceived) in 
comparison with existing practices. Future policy debate is likely to focus on the limits of Open and their 
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implications, while experimental work will continue to challenge the positioning of community-accepted or 
policy-sanctioned boundaries. Further investigation is required here, from both research and policy 
perspectives. 


8 Policy Implications 


Our analysis has highlighted the pluralities and complexities of the open landscape, including factors 
policymakers should consider when designing interventions — such as the levels, benefits, and limits of 
openness, as well as the development paths, maturity stages, and interrelationships of the different domains, 
including their potential convergent momentums, and relevant cultural dimensions. Other specific challenges 
to openness are discussed in the literature (e.g., intellectual property rights, business models, sustainability), 
though generally in relation to particular domains, and we suggest that here too the HER community needs 
cross-domain work that examines issues holistically, such as the Dutch study that produced licensing 
recommendations for sharing both educational and research materials (Keller & Mossink, 2008). A more 
holistic approach is likely to highlight further challenges that can be most effectively addressed through 
policy interventions which take into account the multi-dimensional nature of the problem. One example of 
this is the extent that Open approaches require faculty (and others) to carry out additional work. Sharing 
of research data, for example, often requires extensive processing of datasets and production of metadata 
to enable reuse. Opening up educational resources similarly requires additional work to create contexts for 
wider use and reuse. A holistic approach to comprehending and addressing these challenges is more likely 
to result in workable policy solutions. 

A related issue already identified but not fully examined is the multiplicity of stakeholder groups 
across the open domains, including their particular roles in open initiatives, and the impacts on them. We 
suggest again that potential synergies across the domains could be better exploited by viewing Open 
holistically, for example transferring lessons learned and skills developed from one open domain to another. 
Also, as e-InfraNet (2013, p. 49) notes, a “fragmented perspective” may not only slow down development, 
but may “adversely affect...the entire system.” Supra-institutional agencies (national and international 
organizations) can influence behaviors here by funding programs that require cross-domain rather than 
single-domain open developments. 

Policy initiatives ultimately must focus on developments at the institutional level, where scholarly 
activities with open potential take place. Emerging evidence suggests openness can enhance performance in 
relation to HER missions of teaching, learning, research, and enterprise/knowledge transfer, with benefits 
for individuals, communities, economy and society. Our study suggests institutions will gain additional 
advantage through integrated (not separate) policies that exploit the convergence of open domains and 
recognize general common benefits, while observing particular domain-specific limits (e.g., adjusting 
academic reward systems to encourage behavior that will increase openness in both research and teaching). 


9 Conclusion 


Taken together, the ethical commitment, economic principles, common characteristics, de facto 
interconnectedness, and potential benefits shared by the different domains make the case for the convergence 
and coherence of Open initiatives. More work is needed to test the arguments in some areas and to 
strengthen the evidence base in others. The frameworks presented here can be used to inform policy 
discussion and future studies of Open. 
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Abstract 

Social tagging systems enable their users to access useful or interesting information resources in various 
ways. The purposes of this study are to identify the information seeking modes adopted by users in this 
context and to determine the popularity as well as effectiveness of these modes. A transaction log file 
obtained from Douban, the most influential Chinese-language social tagging system, was examined based 
on an original clickstream data analysis framework. The results show that encountering, browsing by 
resource/tag/user/group, searching, and monitoring by user/group are the major modes ever adopted. 
While browsing by resource is the most popular mode, browsing by tag is the most effective one. The 
research findings enrich our understanding of social tagging systems as vibrant information seeking 
environments and provide useful implications for their interface design. 
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1 Introduction 


The most recent revolution in the information landscape, namely Web 2.0, not only inherits the diversity 
and dynamism of the Web, but demonstrates even greater complexity for allowing ordinary users to create, 
store, and share their own information resources (Marlow et al., 2006). Accordingly, users are driven to 
assume the responsibilities of describing and categorizing the resources to make them findable. They achieve 
this through a lightweight yet efficient cataloging practice known as “tagging” — a user adding metadata or 
keywords to a resource (Golder & Huberman, 2006). 

Tagging is essentially an individual activity since users tag according to their personal 
understanding and in a distributed manner. It becomes social as the social tagging system aggregates users’ 
tags into a social classification system called “folksonomy” (Kroski, 2005). Social tagging systems, of 
particular interest to this study, are unconventional information systems. They are dedicated to preserving 
users’ collections of information resources and basically rely on tagging to organize the resources (Kalbach, 
2007). These two features distinguish them from other websites also supporting tagging, such as 
Amazon.com which has introduced customer tagging to supplement the well-constructed “departments” of 
products. 

As more and more users register with various social tagging systems, the Web is actually 
experiencing the fast self-growth of numerous information repositories, many of which accommodate 
substantial quantities of resources. However until now these systems still have little knowledge about how 
their users are coping with information overload, as evidenced by the lack of relevant research. This study 
is among the first to investigate users’ information seeking behavior in social tagging systems. To be more 
specific, it aims to address the following research questions: 


1. What are the information seeking modes adopted by social tagging system users to find resources? 
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2. How popular is each mode among the users? 
3. How effective is each mode in helping the users find resources worth collecting? 


It is worthwhile to probe into the above questions considering that helping users find needed resources 
and/or discover interesting ones is among the major goals of social tagging systems (Smith, 2008). Being 
blind to users’ actual behavior can be very dangerous to systems that live on user participation. In return 
for their efforts in tagging, users are expecting the expedient acquisition of needed resources. The elements 
associated with their frequently adopted information seeking modes, from the perspective of user-centered 
design, should be easily accessible on the interfaces. If users’ expectations are not met, they would be less 
motivated to contribute tags, leading to inadequately organized systems. 


2 Related Works 


2.1 Theories of Information Seeking Modes 


The modes in which people look for specific pieces of information have been extensively addressed in the 
literature on information seeking behavior. Marchionini (1995) distinguished two classes of information 
seeking strategies at the extremes of continua: analytical searching strategies are goal-driven and require 
planning; and informal browsing strategies are opportunistic and depend on interaction. According to 
Wilson (1997), active search, i.e. seeking out information actively, is the principal information seeking mode 
and complemented by three others — passive attention, passive search, and ongoing search. The two passive 
modes respectively refer to the unanticipated and anticipated acquisition of information, and ongoing search 
the update on information. In Choo et al. (1999), four scanning modes explained in a similar way, including 
undirected viewing, conditioned viewing, informal search, and formal search, are integrated with the 
behavioral model (Ellis & Haugan, 1997) describing six characteristics underlying complex information 
seeking patterns (starting, chaining, browsing, differentiating, monitoring, and extracting) in order to 
indicate which activities are likely to occur frequently for each mode. 

These studies had established preliminarily the division of information seeking modes, whereas 
Bates (2002) provided the most focused and thorough interpretation of such division. Taking into account 
two dimensions — the degrees in which an individual seeks information actively and directionally, she 
identified searching, browsing, being aware, and monitoring as the four modes. Searching and browsing fall 
in the “active” category for both demanding people to invest time and effort to obtain information, but 
they also differ from each other because the former is in principle guided by an articulable need whereas 
the latter usually starts with no particular need (Bates, 2002). Correspondingly, while searchers apply 
cognitive resources to recall from memory certain queries that express their information needs, browsers 
utilize their perceptual abilities to recognize relevant information from the context (Marchionini, 1995). 

Comparatively, most people are much less familiar with the being aware and monitoring modes 
which are often deemed informal. Being aware is simply absorbing random information that comes to us. 
Researchers (Erdelez, 1997; Williamson, 1998) have probed into this mode as “information encountering” 
in particular. Everybody encounters information, information can be encountered everywhere, and the 
encountered information can be used to address any purposes (Erdelez, 1999). A little different from 
encountering, monitoring is absorbing related information that comes to us. We do not act to find answers 
to the questions already in our mind but notice the answers when they appear. Social activities are very 
supportive of monitoring: people are likely to come across a great deal of useful information just in the 
process of interacting socially with others (Bates, 2002). 


2.2 Information Seeking in Social Tagging Systems 


Social tagging systems have grown into a promising research area (Trant, 2009). There has been a persistent 
interest in users’ tagging behavior, including tag usage (type/subject), ranking, growth, distribution, co- 
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occurrence, and so forth (Golder & Huberman, 2006; Marlow et al., 2006; Kipp & Campbell, 2006; Farooq 
et al., 2007; Bischoff et al., 2008; Du et al., 2009; Kakali & Papatheodorou, 2010; Golbeck et al., 2011). In 
contrast, little work has gone specifically into users’ information seeking behavior in this particular context. 
But we still endeavored to identify a range of existing studies that are relevant to different extents. 

It has been noted that social tagging systems conduce to information exploration (Jiang & 
Koshman, 2008). That is, one has a good opportunity to discover unknown or unexpected resources which 
would not be found through directed searching (Kroski, 2005). When a user’s information need is not well 
defined, according to Begelman et al. (2006), he or she may want to explore what other users have tagged. 
This is made possible by aggregating most recently or frequently tagged resources, as well as enabling pivot 
browsing which is a click on a username or a tag leading people to the resources collected by that user or 
associated with that tag (Millen, 2008). In a study on the dogear social bookmarking service, the results 
showed that approximately 60% of the visitors navigated through the aggregated collection of bookmarks 
by user-supplied tags, by users, or by combinations of the two (Millen & Feinberg, 2006). The findings of 
another study also suggested that the navigational functions of a social bookmarking service should provide 
sufficient information about the attached tags and social presence of other users for each bookmark 
(Klaisubun, et al., 2007). Such navigation is social in nature and exclusively afforded by social tagging that 
aims at generating a map that summarizes an explorable space (Chi & Mytkowicz, 2007). In this way, users 
are empowered to make new connections not predefined by the systems, allowing for innovative uses 
(Winget, 2006). 

However known-item search in social tagging systems usually lacks effectiveness (Begelman et al., 
2006). This is because folksonomies lack precision: “when it comes to findability, their inability to handle 
equivalence, hierarchy, and other semantic relationships causes them to fail miserably at any significant 
scale” (Morville, 2005, pp.139). Since the vocabulary problem is inherent in free-form social tagging, the 
marriage of folksonomies and the controlled vocabularies used in professional indexing is advocated 
(Rosenfeld, 2005). Also, what should not be ignored is the problem of tag spamming caused by adding 
attractive yet inappropriate tags to a resource in order to draw traffic to it, which could be tackled with 
spam filtering and reputation mechanisms (Goh et al., 2009). 

The tag cloud visualization, one of the essential socio-technical characteristics of social tagging 
systems, has been holding special research interest for its important role in helping users acquire resources 
(Trant, 2009). The tag cloud offers a visual summary of all the contents, giving users an idea of where to 
begin their information seeking. Scanning it requires less cognitive load than constructing search queries, 
especially suitable for non-specific tasks (Sinclair & Cardew-Hall, 2008). However, the typical tag cloud, 
where related tags are scattered as the result of the alphabetical arrangement, was challenged because 
meaningful connections might be missed (Hearst & Rosner, 2008). A comparative study argued that the 
visualization layout design relied heavily on user purposes (Lohmann et al., 2009). Continuous efforts have 
been made to generate thematically clustered layouts for tag clouds (Hassan-Montero & Herrero-Solana, 
2006; Fujimura et al., 2008; Chen et al., 2009; Aras et al., 2010; Gou et al., 2011). 


3 Method 


3.1 Research Setting 


Douban! is one of the most influential social tagging systems on the Web. A Chinese-language site founded 
in 2005, it has attracted more than 66 million registered users from all over the world. Douban is a social 
library system, to be more specific, for people to discover three types of resources — books, movies, and 
music albums, collect them all in one personal library, and share their libraries with others. Similar English- 


1 http://www.douban.com/ 
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language systems include LibraryThing”, IMDb’, and Last.fm* which specialize in books, movies, and music 
albums respectively. 

As a typical social tagging system, Douban encourages users to tag the resources in their collections 
and aggregates popular tags into tag clouds. Also, users are supported to meet friends and form groups in 
the system. Now Douban is accommodating 310,000 groups that gather users from the same geographic 


76 


locations, or sharing common interests or expertise, such as “Chicago”, “Jazz”°, and “Python”’, just to 
name a few. 

What’s special about Douban is that the type of a resource determines the type of the tags assigned 
to it. That is, there are book tags, movie tags, and music tags, each constituting an independent folksonomy. 
Besides, resource collecting is made more complicated than usual. Users have to select one of the three 
tenses — future (“I want to read/watch/listen to”), present (“I am reading/watching/listening to”), and past 
perfect (“I have read/watched/listened to”) — in order to indicate how familiar they are with the resource 
collected. Nevertheless, this study was conducted regardless of resource/tag types and tenses. 

This study defines information seeking in Douban as looking for resources. Every time a user reaches 
a resource page, i.e. the page offering detailed information about the resource, one can say that she finds a 
resource. On the resource page, the user may perform the collecting action or just leave, signaling whether 
she thinks it useful or interesting. If the former, one can say that her information seeking goal is achieved. 

Many social tagging systems contain primarily six categories of webpages, i.e. home page(s), 
resource pages, tag pages, user pages, group pages, and search pages, all of which are designed to provide 


access to resources. Douban is no exception: 


e On home pages (general, book, movie, and music homes), users will come across unexpected 
resources recommended by the system, including recent, popular, and classic ones; 

e Resource pages and tag pages constitute an information structure where users can make semantic 
navigation, i.e. accessing resources similar to current resources or associated with specific tags; 

e User pages and group pages constitute a social structure where users can make social navigation, 
i.e. accessing resources liked by other people or groups of people; 

e For users with articulable needs, resources matched with their queries will be returned on search 
(result) pages generated by the internal search engine. 


As a whole, a vibrant information seeking environment has developed in Douban. Figure 1 demonstrates a 
navigation map for its information seekers. The stacks represent the above page categories and the thick 
arrows their hyperlinks pointing to resource pages. In addition, thinner arrows are used to indicate other 
available hyperlinks within each page category or across different categories. This map encompasses the 
major possible navigation steps, and each way they are linked up in series will engender a specific 
information seeking path. In particular, the resource collecting action does not belong to any of the page 
categories but may update the content of resource pages. 


? http://www.librarything.com / 

3 http://www.imdb.com/ 

1 http: //www.last.fm/ 

5 http://www.douban.com/group/chicago/ 
ê http://www.douban.com/group/jazz/ 

T http://www.douban.com/group/python/ 
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Figure 1: Hyperlinks among the major page categories in Douban 


3.2 Data Collection and Cleaning 


A random transaction log file was directly requested from Douban. It contains around 20 million clickstream 
records generated on the Web server over a 24-hour period. Websites are usually very careful about releasing 
transaction logs for fear of offending their users’ privacy. Douban also gave full consideration to this issue 
and had a technician encrypt all the user identities in the log file. Specifically, each user was assigned a new 
ID, a string of digits that assumes no meaning but helps distinguish the user from others. 

The CVS-formatted file received from Douban was imported into a single table named original_data 
in Microsoft Access. There were five basic data fields included in this table - USER ID, 
REQUESTED_URL, METHOD, REFERRING_URL, and TIME. Their descriptions are provided as 
follows: 


e USER ID: User’s IP address or username disguised with a 9 or 10-digit number that can be positive 
or negative; 

e REQUESTED_URL: URL of the page requested by the user (the page can be visited by typing 
“http: //www.douban.com” + “URL” in a Web browser, also applicable to the REFERRING_URL 
field); 

e METHOD: Type of request: “GET” — requesting a page from the Web server; and “POST” — 
modifying the content of the data stored on the server; 

e REFERRING_URL: URL of the page from which the user accesses the page in the corresponding 
REQUESTED_URL field; 

e TIME: Exact time when the user makes the request and displayed in the AM/PM format. 
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Ipeople/rink/ GET - 8:43:51 PM 
2061537704 /j/subject_suggest?q4=%E9%87%9 I %ES%AD%97%ES%A 1%94 GET Ipeoplelrink/ 8:44:00 PM 
2061537704 /j/subject_suggest?q=%E9%B7%9 | SES%ADXO7NESKA | %94%E5%BEX9F%E7%9 GET Ipeople/rink/ 8:44:05 PM 
2061537704 /subject_search?search_text=%E9%87%9 I%ES%AD%97%ES%A 1%94%E5%8E%S GET Ipeoplelrink/ 8:44:05 PM 
2061537704 /j/subject_suggest?q=%E9%87%9 I %ES%XAD%97%E5%A |%94%E5%8EX9F%E7%9 GET ipeople/rink/ 8:44:08 PM 
2061537704 /subject/3189420/i=0 GET Isubject_search?search_text=%E9%87%9 I XESWADX97%ESVAI%94%E5%8I 8:44:10 PM 
2061537704 /subject/3189420/i=0 GET Isubject_search?search_text=%E9%87%9 1%E5%AD%97%E5%A 1%94%E5%8I 8:44:35 PM 
2061537704 /j/subject/3 189420/interest?interest=collect&rating=5 GET Isubjecti3 189420/2=0 8:44:43 PM 
2061537704 /j/subject/3189420/interest POST Isubject3 189420/2i=0 8:45:01 PM 
2061537704 /subject/3189420/ GET - 8:45:01 PM 
2061537705 /group/topic/1865987/ GET http://www. baidu.com/s?wd=%B6%F5%C2%D7%B4%BAXDO%AIXB3%AA 8:41:13 PM 
2061537705 /group/topic/4249302/start=100 GET /groupitopici4249302/trom=mb-86987056 8:41:42 PM 
2061537705 /group/topic/4249302/start=200 GET Igroup/topic/4249302/start=100 8:50:33 PM 
2061537706 / GET : 8:43:26 PM 
2061537706 /subject/1427083/rec=V&rec=V GET 1 8:48:05 PM 
2061537706 /doulist/188962/ GET Isubject/1427083/trec=V&rec=V 8:48:23 PM 
2061537706 /subject/1721591/ GET Idoulist/188962/ 8:48:30 PM 
2061537706 /book/ GET Isubject/1721591/ 8:48:44 PM 
2061537706 /bookitag/%ES%93%B2%ES%AD%AG GET Ibook! 8:48:49 PM 
2061537706 /book/tag/%E5%93%B2%E5%AD%AŚ?start=20 GET Moook/tagKES%93%B2%ESSADXAS 8:49:27 PM 
2061537706 /bookitag/XES%93%B2%ES%AD%AG?start=40 GET book/tag/%E5%93%B2%E5%AD%AŚ?start=20 8:49:55 PM 
2061537706 /book/tag/%E5%93%B2%E5%AD%A6?start=60 GET book/tag/%ES%93%B2%ES%AD%A6?start=40 8:50:17 PM 
2061537706 /book/tag/%E5%93%B2%E5%AD%AŚ?start=80 GET Mbook/tag/%E5%93%B2%E5%AD%A6?start=60 8:50:32 PM 
2061537706 /book/tag/%ES%93%B2%ES%ADXAG?start=100 GET Moook/tag/%E5%93%B2%ES%ADMAG?start=80 8:50:51 PM 


Figure 2: A snippet from Table original_data 


Figure 2 captures a snippet, comprising 24 clickstream records (or rows) belonging to 3 users, from Table 
original_data that has been sorted by USER ID firstly and TIME secondly. As can be seen, there are 
unrecognizable character strings starting with “%” in both the REQUSTED_URL and REFERRING URL 
fields. They are actually Chinese tags or search keywords based on the UTF-8 encoding scheme. Given that 
this study involved no semantic analysis, they were not converted into Chinese characters. 

The cleaning of Table original_data was completed in two steps. The first step was removing 
corrupted records, erroneous data produced when the Web server logged the data incorrectly. Errors can 
be easily detected by sorting each column in sequence because they usually appear on the top of, bottom 
of, or grouped together in the sorted column for not fitting the pattern of the normal data in the same 
column (Jansen, 2006). Next, a considerable volume of redundant records were eliminated. They failed to 
reflect how ordinary users navigate within Douban, e.g. requests from external sites, requests by Web search 
engine robots, requests for API services, and so on. Filtering such irrelevant data out helped minimize the 
size of the dataset and expedite the analysis. 

After data cleaning, 10,303,684 clickstream records remained in the table which was then renamed 
cleaned_data. The entire METHOD column was deleted for displaying one invariable value — “GET”, and 
the USER ID, REQUSTED_URL, REFERRING_URL fields were respectively abbreviated to UID, REQ, 
and REF. Table cleaned_data includes 269,658 distinct users, and 22% (N = 59,356) of them have only one 
record each, 69% (N = 186,914) 2 to 99 records, and 9% (N = 23,388) 100 records or over. At the higher 
end, there are 638 extreme users, each of who has no less than 1,000 records, and the maximum number of 
clickstream records a user may have is 27,050. 


3.3 Data Analysis 


The biggest difficulty encountered in this study was that there existed no readily usable method for 
analyzing the above clickstream data. The popular search log analysis framework, namely, investigating 
search log data at the term, query, and session levels, is obviously not applicable here (Jansen, 2008). 
Taking into account the characteristics of clickstream data, the researcher introduced the concept 
“movement”, defined in most dictionaries as an act of changing the location, to represent every single 
clickstream record in Table cleaned_ data. 

A movement describes that a certain user (UID) changes her location within a website, from a 
referring page (REF) to a requested page (REQ), at a certain time point (TIME). Meanwhile another 
concept, “footprint”, was employed to refer to the requested page of a record whose referring page is in turn 
the footprint of the previous record. Then a movement can be represented as M;: FiFi, and F; is the 
footprint left as a result of Mi. Such relationship is illustrated in Figure 3. 
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REF 


REQ 
o-o 
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Figure 3: The relationship between footprints and movements 


Footprints are visits of webpages, so the footprints left in a social tagging system also divide into six major 
types: F{H} (Home pages), F{R} (Resource pages), F{T} (Tag pages), F{U} (User pages), F{G} (Group 
pages), and F{S} (Search pages). For convenience, the following analysis deemed that the collecting action 
generates a seventh type of footprints, F{C}. The type of F; determines the type of Mj. If F.eF{R}, Mi is a 
“pivotal” movement (PM) as called in this paper; if FieF {C}, Mi is a “consequential” movement (CM); and 
otherwise, M; is a “transitional” movement (TM). 

Let’s assume that a user follows the tag “interaction design” on Douban’s book home to the book 
Don’t Make Me Think and add it to her library. This process can be decomposed into three movements, as 
in Figure 4. The movement from home to tag, conducing to finding the book later, is transitional. However 
the PM, i.e. from tag to resource, is directly and indispensably responsible for finding the book. Collecting 
the book, which indicates its usefulness to the user, is the CM. 


Figure 4: An illustration of transitional, pivotal, and consequential movements 


PMs are critical to addressing the first research question, since the footprints one step prior to F{R} provide 
the most reasonable and reliable evidence regarding how users find the resources. In other words, for Mi 
that is a PM, the type of Fiı determines the type of its characteristic information seeking mode. Therefore 
a new table, pivotal_data (Figure 5), was created by selecting all the records with the resource page URL 
(e.g. “/subject/3189420/") in the REQ field from Table cleaned_data. For example, with a search page 
URL displayed in the corresponding REF column, the first row in this table, i.e. PM: F{R}<F{S}, features 
the searching mode. The researcher distinguished manually all the modes ever adopted based on a thorough 
inspection of the entire REF field after sorted. The popularity of each mode was then measured with the 
number of all PMs featuring that mode, denoted by Np. 
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7:53:24 PM 


2304115 /subject/2304115/2i=0 /subject_search?cat=100 | &search_text=%E6%AF%94%EB%BE% 


1961049911 2342570 /subject/2342570/2i=1 /subject_search?cat=|002&search_text=+%E6%9F%B3%E4%BA 7:53:24 PM 
1965504750 2228604 /subject/2228604/?rec=| /movie/ 7:53:24 PM 
1968229564 1780749 /subject/1780749/rec=V  / 7:53:24 PM 
1968765005 1918707 /subject/1918707/ /subject/1389535/ 7:53:24 PM 
1969041548 2311147 /subject/2311147/ /subject/2157131/ 7:53:24 PM 
2005084022 1307657 /subject/1307657/ /movie/tag/%E7%A7%9 | %E5%B9%BB?start= | 60 7:53:24 PM 
2045420428 1891179 /subject/1891179/ / 7:53:24 PM 
2071613577 1863731 /subject/186373 1/2i=0 /subject_search?search_text=%E7%AS%IEXE6%BE%A2%E7%A, 7:53:24 PM 
2103294700 1305472 /subject/1305472/2i=88 /subject_search?start=75&search_text=%E4%BB%BB%ES%BEX%E 7:53:24 PM 
2105515094 3322741 /subject/332274 1/2i=0 /subject_search?search_text=%ES%A6%82%E6%9E%9C%E4%Bt 7:53:23 PM 
-574095283 3048031 /subject/304803 1/2i=0 /subject_search?search_text=%E6%B3%AA%E7%97%95%ES%BS 7:53:24 PM 
-587 168995 1891179 /subject/1891179/ /subject/ 1891 179/discussion?start=60 7:53:24 PM 
-588756536 1457449 /subject/1457449/2i=5 /music/search/The%20Seatbelts 7:53:24 PM 
-592940630 1482072 /subject/1482072/2i=0 /movie/search/Anne%20Hathaway 7:53:24 PM 
-612303642 1424741 /subject/1424741/ /subject/1467776/ 7:53:24 PM 
-624567828 2170629 /subject/2170629/ /doulist/61053/ 7:53:24 PM 
-636274061 2007083 /subject/2007083/2i=0 /subject_search?search_text=%E5%8D%97%E6%96%BI%E7 %9A 7:53:24 PM 
-745952792 1292220 /subject/1292220/ /subject/1292220/edit 7:53:24 PM 
-876888155 1295873 /subject/1295873/ /subject/1293234/ 7:53:24 PM 
975530174 1299059 /subject/1299059/ /subject/1294114/ 7:53:24 PM 


Figure 5: A snippet from Table pivotal_data 


As for the third research question that concerns with the effectiveness of each mode, this study coined the 
“achievement rate” as a basic measure. There is an analogy between collecting a resource in a social tagging 
system and purchasing a product in an online retail store because they both suggest satisfaction with an 
item. E-commerce researchers have been using the “conversion rate”, the percentage of order submissions 
in website visits, to measure the effectiveness of merchandising efforts (Lee et al., 2001; Ferrini & Mohr, 
2008; Booth & Jansen, 2008). 

As not all visits convert into purchases, not all resources found end up with being collected. That 
is, not all PMs are followed by CMs. Another new table, consequential_data (Figure 6), was created by 
selecting all the records with the collecting action URL (e.g. “/j/subject/3189420/interest ?interest=collect” ) 
in the REQ field from Table cleaned_data. By jointly querying Tables pivotal_data and consequential_data, 
the researcher was able to tell which PMs were actually followed by CMs and counted them as effective 
PMs. The achievement rate of an information seeking mode was defined as the percentage of effective PMs 
in all PMs featuring that mode. Denoting the number of effective PMs by Nc, the achievement rate R = 
Nc/ Np. 


(1787981 i/subject/ 178798 | /interest?interest=do /subject/1787981/ 11:10:02 AM 


1033415492 1016060 //subject/1016060/interest?interest=collect /subject/1016060/ 11:09:59 AM 


1124700798 1471556 j/subject/ 147 | 556/interest?interest=wish /subject/1471556/?rec=1 11:10:02 AM 
1950746264 1300299 j/subject/1300299/interest?interest=wish /subject/1300299/ 11:10:05 AM 
1961113862 3238176 j/subject/3238 | 76/interest?interest=do /subject/3238176/ 11:09:58 AM 
2032304833 1048209 //subject/1048209/interest?interest=collect&rating= /subject/1048209/ 11:10:04 AM 
2073359707 1308807 jsubject/ 1308807 /interest?interest=collect&rating= /subject/1308807/2i=0 11:10:04 AM 
2085538177 3156578 j/subject/3 156578/interest?interest=collect /subject/3 156578/ 11:09:58 AM 
-554224332 1422089 j/subject/1422089/interest?interest=wish /subject/1422089/ 11:09:59 AM 
-587635321 2042226 /subject/2042226/?interest=collect&ck=RQe3 /subject/2042226/2i=0 11:09:58 AM 
-591470487 3268216 j/subject/32682 | 6/interest?interest=collect /subject/32682 | 6/ 11:10:03 AM 
-635681130 1819912 j/subject/18199 1 2/interest?interest=collect /subject/1819912/ 11:10:02 AM 
-636185912 2059456 j/subject/2059456/interest?interest=collect /subject/2059456/ 11:10:03 AM 
-636185912 1293422 j/subject/ 1293422/interest?interest=collect /subject/1293422/ 11:10:06 AM 
-636363481 1896550 j/subject/1896550/interest?interest=collect /subject/1896550/?rec=A 11:09:59 AM 
-637161886 1926728 j/subject/1926728/interest?interest=wish /subject/1926728/2i=0 11:10:06 AM 
-769610710 1292276 j/subject/ 129227 6/interest?interest=wish /subject/1292276/2i=0 11:09:59 AM 
974356089 1297102 //subject/ 1297 102/interest?interest=collect /subject/1297 102/?from=mb-86815121 11:09:59 AM 
989245374 2132495 j/subject/2 132495/interest?interest=collect /subject/2132495/ 11:10:01 AM 
993071334 1303394 j/subject/1303394/interest?interest=collect /subject/1303394/ 11:10:03 AM 
994221281 1409704 j/subject/1409704/interest?interest=collect /subject/1409704/ 11:10:07 AM 


Figure 6: A snippet from Table consequential_data 
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3.4 Limitations 


The above research method may be limited by three major factors. First of all, the chosen research setting, 
Douban, is a language-specific social library system. Although it serves a remarkably large number of users, 
the absolute majority of them belong to the Chinese-speaking world. Both similarities and differences have 
been found between Web searching in Chinese and that in English (Chau et al., 2007), yet so far there is 
no evidence that language differences will affect users’ adoption of information seeking modes. Second, the 
time span of the transaction log file requested from Douban is relatively short, only one day. Fortunately 
the considerable size of the data, exceeding 20 million records, compensated this to a certain extent. Last 
but not least, clickstream data analysis was the only method adopted in this research. Despite that the 
transaction logs provide rich unaltered information about users’ behavior, they contribute little to the 
exploration of users’ personal characteristics that may have direct influences on the ways they behave. It is 
hence suggested that one should introduce other methods, e.g. surveys, to tackle such shortcoming (Jansen, 
2008). 


4 Results 


Table pivotal data includes a total of 1,016,808 PMs, involved in which are 139,874 distinct users and 
127,759 distinct resources. In Table consequential_data, the CMs add up to 239,463, involving 38,251 
distinct users and 54,675 distinct resources. Therefore, among the 269,658 distinct users in Table 
cleaned_ data, only 52% of them visited Douban on that day for the sake of information seeking, totally or 
partially, and only 27% of these information seekers eventually made some additions to their libraries. 

The focused inspection of the REF field in Table pivotal_data resulted in the recognition of all 
major types of footprints, i.e. F{H}, F{R}, F{T}, F{U}, F{G}, and F{S}. That’s to say, Douban users in 
reality did avail themselves of all the available access points, including home, resource, tag, user, group, 
and search pages, to acquire resources. These intermediaries act on resource finding in different manners, 
giving shape to different information seeking modes adopted by the users: 


Encountering: F{R}<F{H}; 

Browsing by resource: F{R}<-F{R}; 
Browsing by tag: F{R}<-F{T}; 
Browsing by user: F{R}<-F{U}; 
Browsing by group: F{R}<F{G}; 
Searching: F{R}<F{S}; 

Monitoring by user: F{R}<-F{U}; and 
Monitoring by group: F{R}<F{G}. 


OO. OE RO: ok 


Searching is using the internal search engine to perform keyword search, which is the most readily 
understandable mode. Encountering takes place on home pages because the recommendations of resources 
there are made for all the people. If a resource catches a user’s attention, it must happen to satisfy her 
interest or arouse her curiosity. And all she needs to do is an effortless click. 

When browsing, in contrast, users are much more involved. They have to identify useful leads on 
their vague goals along the way. There are semantic leads, i.e. resources that cover specific topics and tags 
that describe specific topics. Meanwhile, users have their personal interests and groups common interests, 
and these are social leads. Thanks to the richness of hyperlinks, users are able to make pivot navigation 
and easily follow such leads to desired resources, achieving browsing by proxy. 

Users and groups, moreover, can serve as trusted sources of monitoring. In this case, they are the 
users one has connected with and the groups one has affiliated to, rather than previously unknown, random 
ones. Keeping an eye on the updates to their resource collections may be out of socializing purposes or for 
information seeking. Monitoring and browsing by user/group are represented in the same form above, in 
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the analysis however, users or groups accessed by signed-in individuals from their own profile pages were 
considered as the sources of monitoring. 

Figures 7 and 8 show the results obtained from analyzing the popularity and effectiveness of each 
information seeking mode respectively. The larger the value of Np, the more popular a mode is. Higher 
achievement rate R means greater effectiveness of a mode. 


Monitoring by user; 
18,087; 1.84% 


Monitoring by group; 
1,277; 0.13% 


Searching; 264,374; 
26.90% 


Browsing by group; > | 


10,330; 1.05% 


Browsing by user; 
84,411; 8.59% 


Browsing by tag; 
113,357; 11.54% 


Figure 7: The popularity of different modes (Np; proportion) 


Browsing by tag | 37,756; 33.31% 
Encountering QQ 44,548; 27.19% 
Searching QR 61,919; 23.42% 
Browsing by resource 61,109; 18.69% 
Browsing by user 13,070; 15.48% 
Monitoring by user ii 2,456; 13.58% 
Monitoring by group | 139; 10.88% 


Browsing by group ——— 1,122; 10.86% 


Figure 8: The effectiveness of different modes (Nc; R = Nc/Np) 


It is a little surprising that browsing by resource is the most popular mode, even exceeding searching. This 
mode takes two forms in Douban: one can browse “people like this also like” or “Doulist” for similar 
resources. The former is based on collaborative filtering (Linden et al., 2003), whereas the latter is a user- 
compiled list that contains a number of resources sharing certain attributes. The ratio of their adoption 
frequencies is approximately 5:1, indicating a clear preference for the former. Despite its leading popularity, 
the mode of browsing by resource has a poor achievement rate, even lower than the average level (22.60%). 
It can be inferred that the system- and human-determined similarity between resources failed to come up 
to users’ expectations. 

Another information seeking mode presenting interesting results is browsing by tag. The popularity 
of this mode is not competitive in Douban, which suggests general users’ inadequate awareness of social 
tags’ role in aiding exploration. But for those who attempted to obtain resources of value via tags of interest, 
they had a 1-in-3 chance of succeeding, making browsing by tag the most effective mode. Such finding seems 
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to contradict previous criticisms of folksonomies (Morville, 2005) and may to a certain extent relieve the 
worries about their deficiencies, especially the vocabulary problem (Golder & Huberman, 2006). 

The searching and encountering modes both rank among the top three in respect of popularity and 
effectiveness. While searching has unarguably been the dominating mode of human’s everyday online 
information seeking (Tombros, et al., 2005), Douban users did not depend so heavily on the internal search 
engine. The moderate popularity of encountering is understandable because a considerable part of what we 
know is absorbed this way (Bates, 2002). However unexpectedly encountering, the passive and undirected 
mode, is more effective than searching, the active and directed mode. This is an intriguing finding that 
deserves further probe. 

The rest of the modes, i.e. browsing/monitoring by user/group, are all socially oriented. They are 
not only less frequently adopted, but also less likely to lead users to useful resources. Social tagging systems 
assume a dual role as information repositories and social platforms. It appears that Douban users established 
a clear mental boundary between the two facets, and seldom interwove information seeking with social 
networking activities. 


5 Discussion and Conclusions 


The clickstream data analysis identified eight general information seeking modes that were adopted by 
social tagging system users, including encountering, browsing by resource, browsing by tag, browsing by 
user, browsing by group, searching, monitoring by user, and monitoring by group. They have their roots in 
the theories of information seeking behavior (Bates, 2002), but develop in the context of social tagging 
systems. As a matter of fact, the universal tagging elements only contain resources, tags, and users (Smith, 
2008). However, this study also took into account two functional design elements, the home and interest 
groups, that have become increasingly important in the architecture of social tagging systems during the 
past a few years. 

Firstly, the home page design of the systems now thinks less of the navigational purposes and 
instead pays more attention to content aggregation for users’ convenience. Secondly, the design of social 
interaction to be supported in the systems also considers groups which allow users to share information on 
common interests. Such changes have taken place or are taking place in most systems, and they show 
profound influences on users’ information seeking behavior. As a whole, the ways users look for information 
in social tagging systems are greatly diversified in virtue of the connectivity among home, resources, tags, 
users, and groups, as illustrated in Figure 1. 

Experimental research of encountering is difficult to design because it’s hard to anticipate who will 
acquire information in this way, where they will acquire information, or what information they will acquire 
(Erdelez, 2004). Such uncertainties are less obvious in the setting of social tagging systems. Being more 
social-oriented, they deliberately push information resources to users on their home pages, the common 
places for everyone. These resources are usually limited and will be updated frequently. If one can find a 
resource of interest on the home page, therefore, it is completely opportunistic. 

Although resources can be encountered elsewhere in social tagging systems, e.g. running across a 
resource when reading a group discussion making reference to it, they are actually ignorable compared to 
those encountered on the home page. As uncovered in the clickstream data analysis, encountering on home 
was quite popular among Douban users, accounting for 16.67% of all the resource finding occurrences, which 
was the third highest. The great popularity of this mode will probably be seen in other social tagging 
systems in that the visits to any websites usually start with the home pages. Consciously or unconsciously, 
users will notice the potentially interesting resources appearing there. 

Meanwhile, the encountering mode was quite effective in helping users find their needed resources, 
with the second highest achievement rate (27.19%). But such result might be specific to Douban only. This 
particular system has been devoting a lot of efforts to resource recommendation and has achieved great 
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success. It carefully selects hundreds of recent, popular, and quality resources, and presents them to the 
users in a systematic manner. So in a system that does not have a comparable abundance of resources 
and/or lacks organization of resources on its home page, the effectiveness of this mode may not be that 
high. 

Browsing in social tagging systems sometimes is not clearly distinguishable from encountering 
because browsers also feel that they acquire information effort free. For example, on Douban’s resource 
pages, the co-collected resources, if there are any, are just one click away. Notwithstanding, browsing differs 
from encountering for involving a proxy (McKenzie, 2003), being it a resource, a tag, a user, or a group. If 
a user is about to view the resources associated with a proxy, she is aware that they should be related to 
the proxy in some way. Although the user does not have a particular goal in mind, the subject or interest 
of the proxy represents her information need to a certain extent. On the contrary, encountering is viewing 
resources not associated with any proxy. 

Among the eight information seeking modes identified, browsing by resource helped the users find 
33% of the resources they ever found, which made it the most popular mode. It is the most straightforward 
approach to acquiring related resources and takes two forms in Douban, browsing co-collected resources 
and browsing user-compiled lists of similar resources. Nevertheless, browsing related resources is not a 
ubiquitous mode. It is mostly supported in social library systems, and not all of them support both forms. 
For example, Discogs® does not support the former. In spite of its popularity in Douban, this mode had an 
achievement rate (18.69%) even lower than the average of all the modes, suggesting unsatisfactory 
effectiveness. Especially, the former form will often lead users to resources that have already been viewed 
or collected. 

In contrast, browsing by tag was the most effective mode among the eight, though only 
demonstrating moderate popularity. Users tag resources in order to find them again later and help others 
discover them (Trant, 2009). Following tags to acquire resources, so to speak, is the characteristic 
information seeking mode in social tagging systems. But the clickstream data analysis showed that it was 
only the fourth most frequently adopted mode. Now one cannot say whether the mode is less popular in 
other systems too, because Douban users might be reluctant to use the tag cloud due to its low usability, 
which was a special problem in this system. Tags have attracted many doubts about their findability since 
they started to gain prevalence on the Web (Morvill, 2005). However it was found that the achievement 
rate of browsing by tag reached as high as 33.31%, meaning that in every three resources found via tags, 
one of them would be collected. In that tags are semantic expressions, further investigation is needed to 
reveal if tags in other languages also have high findability. 

Compared to the dominant role of Web search engines in general information seeking, the internal 
search engines provided by social tagging systems are affecting their users much less significantly in resource 
finding. In the case of Douban, the searching mode failed to win overwhelming adoption, ranking the second 
in terms of popularity, and moreover, its achievement rate (23.42%) implies merely acceptable effectiveness. 
Actually this mode is mainly appropriate for tasks with specific goals. The disadvantages of Douban’s search 
engine are very common in other social tagging systems, such as Flickr, IMDb, and so forth. It is not 
surprising that the recognizable search keywords are limited and the search results lack ranking. 
Interestingly, these are just trivial problems when the search engines are used for known item search. 

The remaining four modes, i.e. browsing by user/group and monitoring by user/group, are all 
characteristic of information seeking by social proxy. This is looking for resources through an intermediary 
who is a particular person or a cluster of similar persons. Users and groups, as proxies, are not very different 
from each other. Both of them are describable with major interests, and the subjects of their collected 
resources should be able to reflect such interests. The browsing and monitoring modes however work in 


8 http: //www.discogs.com/ 
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different manners, with the former associated with newly discovered or unfamiliar users or groups and the 
latter those that people have established long-term relationships with. Before one starts to monitor a user 
or a group, she usually needs to do browsing first so as to determine whether it is a useful information 
source. 

Based on the results of the clickstream data analysis, these four social-oriented modes were neither 
popular nor effective. They together only explained a little more than 10% of all the occurrences of resource 
finding and their average achievement rates (12.70%) were far below the overall average level. These may 
not be formal modes or they may be applicable only to users who had a passion for social activities. Social 
tagging systems, after all, are not social networking services such as Facebook? and LinkedIn! which 
connect people who are real-world acquaintances and enable them to meet new friends through the old ones. 
The first and foremost goal here is finding resources of interest, and the finding of interesting users or groups 
is the byproduct. In addition, browsing or monitoring a user/group’s collections is usually interwoven with 
browsing or monitoring that user/group’s other information or updates. That is to say, people can be easily 
distracted from information seeking when adopting these modes. 
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Abstract 

Social news sites allow their users to submit and vote on online news stories, thereby bypassing the 
authority and power of traditional newspaper editors. In this paper we explore what motivates users of 
social news sites, such as Reddit, to participate in this collaborative editorial process. We present a tiered 
framework of motivational factors for participating on social news sites, based on a comprehensive 
literature review, drawn from fields like social media research, sociology, (social) psychology, and 
behavioral economics. We then validate this framework through a survey deployed on Reddit and use 
the results of this survey to focus the motivational framework for the social news domain. the recreational 
value of the information posted to Reddit, along with the powerful possibilities for customization appear 
to be the most powerful incentives for using Reddit. Perhaps surprisingly, the social aspect of social news 
sites is not a motivating factor for the majority of Reddit users. Influencing the placement and reception 
of news stories in their niche communities of interest is what draws people to sites such as Reddit. 
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1 Introduction 

Over the past two decades, online news websites have taken slow but steady strides towards increasing user 
involvement through commenting on news articles and easy article sharing on social networking sites like 
Facebook and Twitter. However, the final decision of which articles get the greatest exposure through 
placement at the top of the digital front page still rests solely with the news website’s editors. Social news 
sites do away with this last remnant of expert-based control by allowing the users themselves to vote on 
which stories deserve the greatest exposure, and even submit stories to these websites. 

But what motivates users of social news sites like Reddit and Digg to participate in these activities? 
Is it the social aspect of connecting with friends and like-minded users that motivates users to participate 
on social news sites? Or is it perhaps the shared power to vote on which stories should make it to the front 
page that is attractive to users? 

So far, there have been only a handful of approaches that have examined the motivations of users 
of social news sites. Lerman (2007) tracked the behavior of the top 1000 most active users on Digg over the 
course of a year, and found that competition for the top spot on the ‘Top 1000 users’ list was not as powerful 
a motivator as social recognition, and positive recognition in particular. However, Lerman only examined 
interaction patterns and did not ask Digg users directly what motivated them to participate. Halavais 
(2009), looking specifically at commenting patterns on Digg, also found that positive feedback in the form 
of comments and positive moderation votes motivate users to keep participating. 

Other incentives for user participation have been studied for other types of online communities, 
such as Wikipedia, newsgroups, open-source collaborations, and micro-blogging (see Section 2 for a 
comprehensive overview). To the best of our knowledge, however, no comprehensive framework of 
motivations for participation on social news sites has been created or investigated. 


iConference 2014 Toine Bogers & Rasmus Wernersen 


In this paper, we present such a framework of motivational factors for social news sites. As opposed 
to Lerman (2007) and Halavais (2009), who analyzed user behavior on Digg to focus only on two specific 
motivational factors, we cast a wider net for possible participation incentives and examine their importance 
empirically through a survey of 282 Reddit users. Our contributions in this paper are threefold: 


e <A comprehensive literature review of the motivations for online user participation, drawn from 
fields like social media research, sociology, (social) psychology, and behavioral economics. 

e The organization of these motivational factors into a coherent framework for social media use. 

e An empirical validation of this motivational framework through a survey deployed on Reddit, the 
largest social news site at the time of writing!. 


The remainder of this paper is organized as follows. The next section contains a review of the related work 
on motivational factors for participating in social media. Section 3 describes our methodology, while Section 
4 describes the motivational framework we derived based on our analysis of the related work. Section 5 
presents the results of the empirical validation of this framework. We conclude in Section 6. 


2 Related work 


We present a broad overview of related work on incentives for online user participation in this section. 
There are different ways of grouping together related work on incentives for participation. For instance, 
Rafaeli & Ariel (2008) organize their overview of the different possible motivations for contributing to wikis 
by the scientific discipline they originated from. Kaplan and Haenlein (2010) categorize social media by the 
degree of social presence as well as the amount of self-presentation. We have elected to group related work 
together by domains instead: social news sites, mailing lists and newsgroups, online communities, and online 
collaboration initiatives, such as Wikipedia and open-source projects. This overview of motivational factors 
will then be condensed and organized into a coherent framework of user motivations in Section 4. 


2.1 Social news sites 


In recent years, there have been only a handful of approaches that have examined (a subset of) motivations 
of users of social news sites. Lerman (2007) analyzed user behavior on Digg and found that competition (in 
the form of "Top 1000 users’ lists) is not as powerful a motivator as social recognition. Positive recognition 
was found to motivate users to stay active or become more active, whereas negative recognition could have 
negative effects on community longevity’. Studying the spread of interest in news stories on Digg, Lerman 
et al. (2008, 2012) found that stories that spread mainly outside a submitter’s local community of friends 
are much more likely to become popular on Digg . This suggests that one’s reputation is not tied to one’s 
friendships on a social news site, which means these motivational factors are not necessarily related. 

Halavais (2009) looked at commenting patterns on Digg and found that getting feedback is an 
important motivator: users are more likely to keep commenting if they receive positive moderation votes 
and comments on their own comments. Sadlon et al. (2009) view Digg story submissions and promotion as 
an ecology and found that reciprocity is an important factor in user behavior, and a good predictor for 
which stories get promoted. 


2.2 Social media 


Most other research has focused on the motivations for using social media in general. For instance, 
Brandtzæg and Heim (2009) investigated the motivations for using Norway-centered social networking sites. 
They found that the most important reasons for using such sites was to get in contact with new people, 
staying in touch with existing friends, and general socializing. Other motivations that emerged from their 


1 According to http://mashable.com/2011/04/28/reddit-digg-traffic/, last accessed February 21, 2013. 


? We will use the terms ‘group’ and ‘community’ interchangeably in this paper. 
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survey were accessing information and staying informed about events, debating and discussing topics with 
others, and procrastination and entertainment. 

Brandtzæg and Heim (2007) also looked at the other side of the coin: what motivates people to 
withdraw from social media websites? Common reasons for withdrawing from social media included a lack 
of friends or interesting people attending, low quality content, low usability, and a lack of entertainment 
value in general. 

Kietzmann et al. (2011) present a framework for defining social media by using seven functional 
building blocks: identity, conversations, sharing, presence, relationships, reputation, and groups. Some 
motivational factors they mention are meeting new like-minded people and incentives related to personal 
growth, such as building self-esteem, learning about new topics, sharing information, and making a positive 
ideological impact. 


2.3 Newsgroups & mailing lists 


Joyce and Kraut (2006) analyzed posting behavior of newcomers on newsgroups and found that positive 
recogni- tion in the form of comments and responses to newcomers’ initial posts motivated them to stay 
active in those newsgroups. The quality of the comments and information was not found to have an effect 
on their posting behavior. 

Arguello et al. (2006) analyzed posting behavior in newsgroups in general and found that the 
information context, the poster’s prior engagement in the community, and the post content all influenced 
the response rate. Their study suggests that people are motivated to continue posting in newsgroups and 
mailing lists if the activity and response levels in a community are high. In addition, friendship relations 
with other users and the right amount of stories posted serve as incentives to continue participating in these 


communities. 


2.4 Online communities 


Ridings et al. (2002) looked at the effects of trust on virtual communities and found that it can be a powerful 
incentive for users when sharing information in the virtual community. In their 2004 article, Ridings & 
Gefen also investigated the reasons for participating in virtual communities (Ridings & Gefen, 2004). While 
motivations varied strongly depending on the type of virtual community, information exchange was the 
most popular reason across community types. Other important motivations were friendship, recreation and 
entertainment, as well as the technical functionality of the community website. Lampel and Bhalla (2007) 
found that the desire to be recognized and achieve status is particularly important to understanding the 
motivations of those who contribute to virtual communities. Brown and Capozza (2006) discuss group 
identity and its influence on people’s social identities and self-evaluation. Their work suggests that 
strengthening inter-group ties—for instance through joining online communities with a strong presence of 
existing friends—can be a strong incentive for joining new and existing communities. 

Oliver and Marwell (1998) looked at the effect of group size on collective action and found that 
group size as well as the costs of collective goods have an effect on the amount of collective action undertaken 
by the group. Milgram et al. (1969) examined the effects of group sizes in an offline setting by investigating 
the drawing power of different-sized crowds. They found that the size of a crowd has an influence on the 
behavior of individuals outside that group, with larger crowds making it more likely for outside individuals 
to exhibit the same behavior. This suggests that group size could also be an incentive for participating in 
group activities on social media. 

Altruism can be another motivation to contribute to online communities according to Ren and 
Kraut (in press), although this is less likely if the community is large or if people believe other community 
members are already contributing. Identification with the group as a whole (social belonging) and 
interpersonal bonds with individual members (friendship) can also motivate people to participate for a 
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longer period of time (Sassenberg, 2002). Repeated interactions make such interpersonal bonds stronger and 
more likely to occur. Reputation is another factor that motivates people to participate in online communities 
as well as the enjoyment they derive from reading and posting online (Ridings & Gefen, 2004). 


2.5 Online collaboration 


Much of the work done through online collaboration is on a volunteer basis. Clary et al. (1998) divided the 
motivations for volunteering into six categories. One of their categories, ‘enhancement’, addresses the need 
for recognition, personal growth, and self-esteem. Moderators on Reddit may be motivated to do their work 
for similar reasons. 

Nov (2007) looked specifically at the motivations for contributing to Wikipedia. He found that 
these motivation range from the joy of writing to motivations related to personal growth, such as the 
opportunity to learn new things and the desire to contribute to knowledge in the global society. Nov created 
a survey with questions corresponding to Clary’s six categories (Clary et al., 1998), and correlated these 
with contribution levels of Wikipedians. The motivations they looked at were protective, values, career, 
social, understanding, enhancement, fun, and ideology. He found that the joy of writing (fun), learning 
about new things (enhancement), and alleviating loneliness (protective) showed the strongest significant 
correlations with contribution levels. Rafaeli and Ariel (2008) present a comprehensive overview of the 
different possible motivations for contributing to wikis, as organized by scientific disciplines. Common 
factors that originate from many different disciplines are the desire for personal growth, reciprocity and 
reputation. A sense of community and commitment to it and the prestige of the community as a whole are 
also important, as well as socializing using communicative facilities. Other powerful incentives are people’s 
intrinsic desire for pleasure, entertainment, and aesthetics, as well as the perceived informational value of 
the wiki. This also suggests that both the information quality and quantity could be important to users of 
social news sites. 


2.6 Miscellaneous 


The related work below does not belong to a single unified domain, but provide additional possibilities for 
why people are motivated to participate on social news sites and social media in general. One of the seminal 
works on human needs is Maslow’s hierarchy of needs that drive human activity (Maslow, 1954). This 
hierarchy could also provide suggestions for what drives user activity on social news sites. While the bottom 
levels of physiological and safety needs are less likely to be relevant for participating in social news sites, 
the needs for love and belonging, esteem (self-esteem, achievement, and recognition of and by others), self- 
actualization could be relevant incentives for participation here. 

Fogg et al. (2003) examined the factors that affect the credibility of websites. They found that the 
design and usability of a website was one of the most important factors influencing the credibility. This 
suggests that this could also be an important incentive for using social news sites. In their work on e- 
commerce paradigms, Hoffman and Novak (1997) argue that the World Wide Web has become a prime 
source of information for satisfaction transactional information needs. This could be one of many possible 
motivations for using Reddit: finding out more about new or existing products. We expect this motivation 
to play only a minor role though, if any. Su et al. (2011) looked at the motivations for purchasing direct- 
to-consumer genetic testing. While this is different from social news sites in many ways, some motivational 
factors are likely to be shared, such as curiosity and fascination, as well as recreational and ideological 
reasons, such as contributing to research. 

Jakobsson (2011) looked at achievement systems for console gaming, which are often considered as 
extrinsic rewards for playing games. Intrinsic motivations, such as interest and enjoyment of the games 


themselves, are the other side of the coin. Jakobsson also argues that achievement systems play on our 
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desire for a good reputation to keep drawing people into games. However, at its extreme it can also turn 
participation into a chore. The equivalent could hold for posting on social news sites. 


3 Methodology 


In the previous section, we reviewed a broad range of related work on incentives for user participation in 
social news sites as well as other domains, such as online communities, newsgroups, and online collaboration. 
Section 3.1 describes how we combined these different factors into a coherent framework. Section 3.2 
describes how we validated this framework using an online survey deployed on Reddit. 


3.1 Motivational framework 


After reviewing the related work for possible incentives for user participation in social news sites mentioned 
in Section 2, we collected a set of 55 snippets and quotes related to different motivational factors. Both 
authors then collectively used card sorting (Weller & Romney, 1988) to group related snippets together into 
26 individual motivational factors. These 26 factors were then grouped together again at a higher level, 
until we ended up with seven different mid-level categories. Finally, we combined these seven mid-level 
categories into four top-level categories: Personal (P), Social (S), Informational (1), and Website 
characteristics (\V)*. Section 4 describes the different levels of our motivational framework. 


3.2 Survey 


To validate our motivational framework, we developed a survey with questions corresponding to each of 
the 26 motivational factors. We deployed this survey on Reddit, because it was the largest and most 
popular social news site at the time of conducting this research. Reddit attracted over 3.4 billion page 
views in August 2012+, and according to Alexa, Reddit pulled in 14 times more visitors in the first quarter 
of 2013 than Digg, another popular social news site®. However, according to a recent report by Duggan 
and Smith (2013), this still only corresponds to about 6% of all Internet users. 


Survey development 

Our survey consisted of six different parts®. Part one contained questions about the participant’s use of 
Reddit: whether they have a user profile on Reddit and how often they use different functionality on Reddit, 
such as posting, commenting, and voting. The next four parts corresponded to our four top-level categories 
Personal, Social, Informational, and Website characteristics, with one question corresponding to 
the 26 motivational factors grouped under these four categories. An open comment field was included at 
the end of each part. The sixth and final part of the survey focused on demographics (e.g., gender, age, 
country of origin) to compare our sample characteristics to those described in earlier work. In addition, we 
asked participants for their Reddit user name and permission to crawl and analyze their user profile for 
further analysis. 


Deploying the survey on Reddit 

To enable the greatest exposure to Reddit users, we decided that the best place to deploy the survey would 
be on Reddit itself. There are two options for deploying a survey on Reddit: displaying it as an advertising 
banner, or posting it in one or more of the many subreddits on Reddit. A subreddit is a sub-forum on 


3 Individual factors will be labeled as X.y, where X is the top-level category and y is the number of the individual factor under category. 
For instance, P.1 would be the first Personal factor. 

* According to http://mashable.com/2012/09/06/reddit-pageviews-august/, last accessed April 13, 2013. 

5 According to http://www.alexa.com/siteinfo/reddit.com, last accessed April 13, 2013. 


ê We have made our survey questions available online at http://anon.ymiz.ed/url. 
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Reddit, focused on a particular topics, such as Politics, Science, and Gaming. Subreddits can be private or 
public and are run by moderators who decide whether or not posts to the subreddit are on-topic. This also 
means that cross-posting the survey to the largest subreddits would be a futile exercise, as it would be 
removed very quickly. We therefore selected the following five on-topic subreddits to post the survey in: 


e Assistance allows its members to any kind of requests for assistance from Reddit users. 

e Favors allows its members to make small non-monetary requests and offers of assistance between 
Reddit users. 

e SampleSize is a subreddit dedicated to surveys produced for and by Reddit users. 

e Self is a subreddit for discussions and questions about any kind of topic. 

e SocialMedia is dedicated to listing resources for learning better to utilize and enjoy social media 
sites. 


Together, these five subreddits have a little under 120,000 subscribers. However, the average number of 
registered Reddit users online was around 1,000 at any given time when the survey was active. Due to the 
dynamic nature of Reddit’s voting system, our survey was not likely to stay at the top of these subreddits 
(i.e., the top 20 most popular posts) for a long time without consistent up-voting by the subreddits’ 
subscribers. Indeed, our survey remained at the the top for about 14 days, with 97% of respondents 
answering within 5 days. In total, we received 282 valid responses to our survey, the results of which will 
be analyzed in Section 5. 


4 A Framework of Motivational Factors for Social Media Usage 

Figure 1 shows the full framework of 26 motivational factors, organized into mid-level and top-level 
categories. The following four sections describe our four top-level categories in greater detail. Each of the 
26 individual factors are explained here, organized by mid-level category. For each motivational factor, we 
list the references (discussed in Section 2) that they originated from. 


4.1 Personal 


4.1.1  Self-promotion & Reputation 
Self-promotion (P.1) represents the desire of a user to promote their own work, viewpoints, or interests 
(Brandtzeg & Heim, 2009). Self-promotion can be both positive (by writing insightful or intelligent 
comments) and negative (by posting inflammatory messages meant to provoke an emotional response in 
other users, also known as trolling). 
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Figure 1: Framework of 26 motivational factors for social news site usage, organized into seven mid-level 
and four top-level categories. 


Social exchange (P.2) is also known as reciprocity and describes the act of rewarding positive actions with 
other positive actions in response. In the context of social news sites, this could cover commenting and up- 
voting stories of other users, with similar actions in return. Users that do not believe their actions will be 
reciprocated are less motivated to keep participating on the social news site (Halavais, 2009; Sadlon et al., 
2009; Rafaeli & Ariel, 2008). 

Reputation (P.3) represent the positive recognition and credibility users can gain through their 
actions and participation. The standard way of measuring reputation on Reddit is through karma. Users 
can earn so-called ‘karma points’ by posting highly rated links as well as highly rated comments on Reddit. 
By earning karma points, Reddit users gain more reputation and such achievement systems have been 
shown to be powerful incentives for participation (Clary et al., 1998; Jakobsson, 2011; Joyce & Kraut, 2006; 
Kietzmann et al., 2011; Lampel & Bhalla, 2007; Lerman, 2007; Lerman & Galstyan, 2008; Maslow, 1954; 
Rafaeli & Ariel, 2008; Ridings & Gefen, 2004). 

Status (P.4) is commonly defined as a user’s “relative standing in a group when this standing is 
based on prestige, honor, or deference” (Lampel & Bhalla, 2007, p. 437). In online communities, status is 
different from reputation in that reputation is typically used as input for gaining higher status (Lampel & 
Bhalla, 2007; Jakobsson, 2011; Maslow, 1954). For instance, more karma points could lead to greater 
prestige on Reddit or being asked to become a moderator of a specific subreddit. 

Personal growth (P.5) represents the different incentives related to personal development and 
growth, such as building self-esteem, altruism, making a positive ideological impact, or the opportunity to 
learn new skills (Arguello et al., 2006; Clary et al., 1998; Kietzmann et al., 2011; Maslow, 1954; Nov, 2007; 
Rafaeli & Ariel, 2008; Ren & Kraut, in press; Su et al., 2011). 


4.1.2 Recreation 


Curiosity (P.6) represents the desire to learn new things or learn more about interesting topics and is a 
common incentive for user participation (Kietzmann et al., 2011; Jakobsson, 2011; Brandtzæg & Heim, 
2009; Su et al., 2011). 
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Entertainment (P.7) is one of the most commonly mentioned affective and explicit motivations for 
user participation, both explicitly (for fun) (Jakobsson, 2011; Ridings & Gefen, 2004; Nov, 2007; Brandtzæg 
& Heim, 2007; Su et al., 2011; Rafaeli & Ariel, 2008) as well as implicitly for procrastination (P.8) purposes 
(Brandtzeg & Heim, 2009). Procrastination is commonly defined as delaying an intended activity by 
undertaking another counterproductive activity instead, which is another likely motivation for spending 


time on social media websites in general. 
4.2 Social 


4.2.1 Friendship 


Another possible motivation for participating on social news sites could be the social filtering (S.1) of the 
news stream that takes place when users collaboratively vote on which stories should make it to the front 
page (Rafaeli & Ariel, 2008; Kietzmann et al., 2011). 

The presence of friends (S.2) on the social media website or online community can be another 
powerful motivating factor and, in extreme cases, people can even pressured into joining a website or 
community because most of their friends have as well (Arguello et al., 2006; Brandtzæg & Heim, 2007, 2009; 
Lerman & Galstyan, 2008; Maslow, 1954; Ren & Kraut, in press; Ridings & Gefen, 2004; Sassenberg, 2002). 

Following friends (S.3) on a social media website is another oft-mentioned motivation for 
participating (Arguello et al., 2006; Brandtzeg & Heim, 2009; Maslow, 1954; Ridings & Gefen, 2004; 
Sassenberg, 2002). This is likely to be a stronger incentive on sites where following friends is an essential 
part of the user experience. On social news sites the core activities are reading, posting, commenting, and 
voting on news articles, but following friends could still be a partial incentive. 

In addition keeping track of old friends, making new friends (S.4) is another possible motivation 
for participating more actively on social news sites (Kietzmann et al., 2011; Brandtzæg & Heim, 2009). 


4.2.2 Community 


Trust (S.5) could be a powerful motivator for active participation in online communities, such as social 
news sites (Ridings et al., 2002). This could involve trusting that a user’s contributions are taken seriously 
and reacted to with integrity by the community (Halavais, 2009). 

Socializing (S.6) is another potential incentive for participating on social news sites (Brandtzæg & 
Heim, 2009). In the context of social media—and social news sites in particular—socializing can be defined 
as interacting socially with other users in a community, often with the implicit goal of acquiring, adhering 
to, and spreading the norms and customs of that community, thereby strengthening the social cohesion of 
the community. 

Group identity (S.7), commonly expressed through the traditions and cultures of the group or 
community in question, can be another powerful motivator for participation in that group, because of its 
influence on people’s social identities (Arguello et al., 2006; Brown & Capozza, 2006; Rafaeli & Ariel, 2008; 
Ren & Kraut, in press; Sassenberg, 2002). People who feel a strong sense of belonging with a specific 
community are more likely to join and stay active in that community. 

Group size (S.8) can have both a positive and negative effect on the desire to join a community, 
such as those present on social news sites. Large communities can make it harder for users to make 
themselves heard, yet the increased anonymity can be appealing to others. In addition, large groups often 
experience a rich-get-richer effect that draws in new users (Arguello et al., 2006; Milgram et al., 1969). The 
reverse can be true for communities that are too small in size. The optimal group size can depend on factors 
such as the group’s topical focus as well as the actions typically undertaken as a community (Oliver & 
Marwell, 1998). 
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4.3 Informational 


4.3.1 Consumption 


Information consumption (I.1) is an essential part of social media, especially social news sites. The 
consumption of information in the form of news articles and responses to them is therefore likely to be 
strong incentive for people to participate on social news sites (Brandtzæg & Heim, 2009; Fogg et al., 2003; 
Ridings et al., 2002; Ridings & Gefen, 2004). 

The quality of information (1.2) posted on social news sites in the form of news articles and 
comments is likely to be an important incentive for participation (Fogg et al., 2003; Rafaeli & Ariel, 2008; 
Brandtzæg & Heim, 2007). 

Online shopping (1.3) and locating relevant information to support such transactional information 
needs has become an important part of the World Wide Web (Hoffman & Novak, 1997). While perhaps not 
the most important source of information, social media, and social news sites in particular, could 
nevertheless be used to satisfy such transactional needs. 


4.3.2 Exchange 


The information quantity (1.4) in the form of news articles posted to a social news site could be an important 
motivational factor (Arguello et al., 2006; Rafaeli & Ariel, 2008). 

Similarly, the possibility for debating (1.5) the news articles and links posted to a social news site 
are also likely to motivate users to participate (Brandtzæg & Heim, 2009). 

The possibility of conveniently sharing information (1.6) with other users of a social news site is 
another likely motivational factor (Kietzmann et al., 2011; Brandtzæg & Heim, 2009; Ridings et al., 2002; 
Ridings & Gefen, 2004). 


4.4 Website characteristics 


The functionality (\V.1) offered by a website can be an important part of what motivates people to use that 
website (Fogg et al., 2003; Ridings & Gefen, 2004). Changes to the interface and functionality to Digg, a 
rival website to Reddit, caused traffic to Digg to drop by 26%", suggesting that the functionality offered by 
a website has a strong influence on whether or not (people continue to) use it. 

Supporting synchronous communication between users of social news sites through chatting 
functionality (\V.2) is mentioned often enough to warrant including it as a separate motivation (Brandtzæg 
& Heim, 2009; Rafaeli & Ariel, 2008). 

Credibility (\V.3) refers to both the objective and subjective believability of a message, which could 
be an incentive for participating on Reddit, seeing as exchanging information is an important part of using 
Reddit (Fogg et al., 2003). 

Usability (\V.4) refers to the ease of use and learnability of interface and functionality of the social 
news site. Changes in the interface and functionality of a website—as described above in the case of Digg— 
tend to have a strong influence on its usability (Brandtzæg & Heim, 2009, 2007; Fogg et al., 2003). We also 
choose to group the satisfaction of affective needs under usability, such as the intrinsic desire for aesthetics 
(Rafaeli & Ariel, 2008). 


5 Results of Survey Validation 

In this section, we present the results of the survey used to validate our framework of motivations for using 
social news sites. We describe the way our participants typically use Reddit, their attitudes towards the 
difference motivational factors, and the demographics of our sample. 


T According to http://readwrite.com/2010/09/23/digg redesign _tanks_traffic_down_ 26, last accessed July 23, 2013. 
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5.1 Reddlt usage 


Lurkers are typically defined as people that are more likely to consume information from a website or service 
than to produce and contribute new content for it, although the exact minimum level of participation varies 
by website or service (Nonnecke & Preece, 2000). 

In the first part of our survey, we asked the participants three questions about their activity level 
with regard to posting new content, commenting, and voting on existing content. Based on their responses, 
we assigned our participants to one of three categories of users, depending on the activity type: (1) active 
users, (2), casual users, and (3) lurkers. Different activities require different levels of effort, so we define 
these three types differently for the three activity types, based on experience with Reddit and common 
sense. 

With regard to posting new content, active users post new content more than once a week. Casual 
users will have posted less than this, but at least once in their time on Reddit, whereas lurkers never post 
new content. With regard to commenting, active users, will comment at least once a day, whereas casual 
users will comment at least once a month. We expect lurkers to comment less than once a month. Voting 
on Reddit content requires the least effort. Therefore, to be classified as an active user, one would have to 
vote several times a day. Casual users vote at least once a week, with lurkers voting no more than once a 
month. Based on these distinctions, Figure 2 shows how our survey participants fall into these different 
categories. 


Posting Commenting ‘mk Voting 
urker 


8% \ 


Figure 2: Distribution of activity levels for posting, commenting, and voting by the Reddit users in our 
survey (N = 282). 


While the activity level typically depends on the website when defining lurking behavior, it appears that 
the participants in our survey are more likely to be active or casual users. This means that it might be 
problematic to generalize our results to the larger population of lurkers on Reddit. 


5.2 Motivational factors 


Figure 3 show the results of the four parts of our survey corresponding to the four top-level categories 
Personal, Social, Informational, and Website characteristics, with all 26 factors sorted by 
agreement. The most important motivation for using Reddit for the participants in our survey is 
entertainment (P.7): 83% of participants strongly agreed with this statement and none disagreed. This is 
also reflected in comments, such as “I do it when for fun to take a break from working [sic]” (id-93) and 
“T use reddit because it is fun” (id-164). The other two recreation-related Personal factors, curiosity (P.6) 
and procrastination (P.8) are also in the top four with a respective combined agreement of 94% and 88%. 
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Rounding out the top five, all with median scores of 5, are information quantity (1.4) and usability 
(W.4) with a combined agreement of 88% and 84% respectively. Some telling quotes about information 
quantity include “I enjoy the wide range of content, it keeps me entertained for much longer than a site 
dedicated to a single type of content” (id-102) and “It has a wide variety of information which I like.” (id- 
93). Quotes such as “I like how simple it is for a noob like me to use. Seriously, that’s an accomplishment. ” 
(id-142) and “It’s simple and clean. That’s a big plus.” (id-58) demonstrate the importance of website 
usability. 

Below the top five is a group of seven factors that all have more than 50% combined agreement 
and median scores of 4. These factors mostly come from the Information and Website characteristics 
categories. Information consumption (1.1), information quality (1.2), shopping (1.3), and debating (1.5) have 
a combined agreement between 55% and 75%. Quotes highlighting the importance of the information-related 
aspects include 

“T use Reddit to follow niche news- news about things that are important to me but aren’t important 
enough to a wide enough audience for the stories to end up in mainstream channels.” (id-58) and “With 19 
million users (I think) there’s almost no news story that doesn’t have an eyewitness on reddit. The difference 
between reading what a reporter who showed up after the fact wrote and somebody who can honestly say, 


” 


“T was there...” is very powerful.” (id-205). These suggest that the quality of first-hand reports combined 
with the specialization that subreddits offer are important reasons for using Reddit. This is also reflected 
in the importance of credibility (W.3) of the information and the website itself. Website functionality (\V.1), 
related to usability, garnered a combined agreement of 54% and a combined disagreement of 19%. 
Participants typically mention Reddit’s many customization options as one of the great benefits of using it. 

Personal motivations related to self-promotion and reputation do not appear to be important to 
the Reddit users who participated in our survey. With the exception of social exchange (P.2) at 52% 
combined agreement, the other four factors—self-promotion (P.1), reputation (P.3), status (P.4), and 
personal growth (P.5)—rarely appear to be a reason for people to use Reddit, with median scores of 1 for 
three of these factors. No users specifically commented that they used Reddit for reasons related to self- 
promotion or reputation. In fact, one person specifically stated the opposite: “I use Reddit when I am 
avoiding thinking about things which I know need addressing. It’s a terrible coping mechanism and I was 
a better person five years ago before I ever knew it existed.” (id-145). 

The Social aspect of Reddit does not appear to be a strong motivation for users to participate, 
with friendship-related factors being valued even less than community-related factors. Median scores for the 
community- related factors, such as trust (S.5), socializing (S.6), group identity (S.7), and group size (S.8), 
range from 2 to 3, suggesting that the Reddit users that participated in our survey is split over how 
important the community is to them. This is also reflected in their comments: some users reflect positively 
on this aspect, such as “Reddit fills holes that my real-life friends can’t fill because they’re not interested 
in all the same things I am.” (id-58); “I use it to find people who are more like me, my own community, if 
you will.” (id-184); and “Several subreddits do provide shared interests and views and a sense of community 
between me and my peer users, however, I feel my opinions on certain issues to be at odds with the overall 
user base of the website.” (id-139). Others have a negative opinion of interacting with the Reddit 
community, such as “I don’t have any social motivations for using Reddit. If anything my experience with 
Reddit regarding social interaction has been negative.” (id-145) and “Reddit is not welcoming, it does not 
feel like a group. It is very judgmental and I rarely feel comfortable.” (id-157). 

Friendship-related factors are among the lowest-rated motivational factors with median scores of 1 
for social filtering (S.1), presence of friends (S.2), following friends (5.3), and making new friends (5.4). The 
highest-rated of these factors, social filtering, only has a combined agreement of 12%. The comments left 
by the participants also reflect this: “I like Reddit specifically because nobody I know uses it.” (id-142) 
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Figure 3: Overview of motivational factors’ importance for all Reddit users in our survey (N = 282). Factors 


are sorted by combined agreement, i.e., ‘Strongly agree’ and ‘Agree’ combined. 
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and “If my friends became active Redditors I would become less inclined to use the site. ” (id-260). The 
only factor with a median score of 1 that was not Personal or Social was chatting (W.2), which is also 
the website characteristic that is most related to social behavior. In sum, the results of our survey suggest 
that the social aspects of social news sites are not important for the majority of Reddit users. 


5.3 Demographics 

In the final part of the survey, we asked our participants some basic demographics questions, such as 
country of origin, age, and gender. This part of the survey was answered by 279 out of 282 participants. 
Figure 4 shows the main results. 


Country of origin Age 


> 45 yrs 
3% 


Australia 
2% 


35-44 yrs 
6% 


UK 
6% 


Figure 4: Demographics (country of origin and age) of the Reddit users in our survey (N = 279). 


Use of Reddit is predominantly an Anglo-Saxon affair, with 85% of Reddit users originating from the US, 
Canada, the UK, and Australia. Our distribution is similar the one found in a 2011 survey of 32,756 Reddit 
users, where 64.3% reported hailing from the US, 9.1% from Canada, 6.1% from the UK, and 3.3% from 
Australia. 

ReddIt usage appears to be dominated by users under 35 with around 91% of all users. The average 
age in our survey is 24.7 years. The age distribution shown in Figure 4 matches the 2011 survey results 
closely: the 2011 survey reported 55.5% of all users to be under 24, 35.4% between 25 and 34 years old, 
6.9% between 35 and 44, and 2.1% over 45. 

Gender was distributed evenly with 49% female and 51% male respondents in our survey. The 2011 
survey shows a more skewed distribution at 18.9% female and 81.1% male. This suggests that, in general, 
our smaller sample seems to be representative of the larger Reddit population. 


8 Survey data available at http: //blog.reddit.com/2011/09/who-in-world-is-reddit-results-are-in.html, last accessed Au- gust 13, 
2013. 
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6 Discussion & Conclusions 


In this paper we explored what motivates the usage of social news sites and Reddit in particular. Based on 
a comprehensive literature review, we constructed a tiered framework of 26 motivational factors for social 
news sites, followed by an empirical evaluation using a survey of 282 Reddit users. Based on our results, 
there is one obvious question to ask: how social are social news sites? Despite the ‘social’ moniker, it would 
appear from our results that, unlike other social media (Brandtzæg & Heim, 2007, 2009), the social aspect 
of websites such as Reddit is not a powerful incentive for their continued use. The Reddit users in our 
survey consistently stated that the social aspect, especially in terms of friendship relations, was not 
important to them when using Reddit. 

In contrast, the recreational value of the information posted to Reddit as well as its quality, along 
with the powerful possibilities for customization appear to be the most powerful incentives for using Reddit. 
This suggests that, for their users, the main difference between traditional online newspapers and social 
news sites is not so much the social aspect, but rather that they can influence the placement and reception 
of news stories in their niche subreddits of interest through voting and commenting. While we cannot 
preclude any possible interaction effects between the different motivational and specific social features, we 
believe that social news sites would therefore be better off focusing on these aspects rather than injecting 
their websites with more social features. 

It should be noted that the participants in our survey are less likely to exhibit lurking behavior 
based on their responses. This might make it problematic to generalize our results to the larger population 
of lurkers on Reddit. However, we expect our many of our conclusions about Reddit usage to hold for lurkers 
as well. If lurkers by definition have the lowest level of interaction with Reddit, then they are even less 
likely to be motivated by social factors or incentives related to self-promotion and reputation. Similarly, we 
have no reason to assume that reasons such as entertainment and the quality of information do not apply 
to lurkers. We therefore expect our results to apply to the lurking Reddit users as well, with some minor 
deviations. 


6.1 Future work 


In future work, we wish to triangulate our findings by performing a content analysis of the wealth of 
comments we received in our survey, as well as crawl interaction data on Reddit of the 86 users who 
consented to this, to determine whether we can see the same pattern in their interaction with Reddit and 
its users. 

In addition, we wish to take a closer look at the different activity levels of the users in our survey 
to determine whether different levels of activity—for instance, redditors vs. lurkers—correspond to different 
motivational preferences. This also holds for demographical features: are there differences between gender 
or age groups in what motivates them to use Reddit? 
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Abstract 

After Edward Snowden leaked classified intelligence records to the press in June 2013, government 
metadata surveillance programs — and the risk that large-scale metadata collection poses to personal 
information privacy — has taken center stage in domestic and international debates about privacy and 
the appropriate role of government. In this paper, the authors approach these questions by drawing 
upon theory and literature in both law and archival studies. This paper concludes that, because metadata 
surveillance can be highly intrusive to personal privacy — even more revealing in certain regards than the 
contents of our communications in some cases — and that certain types of metadata are inextricably 
linked with the records of our digitally mediated lives, legal distinctions that draw a line between 
communications “content” and metadata are inappropriate and insufficient to adequately protect 
personal privacy. 
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1 Introduction 


Surveillance in public spaces is becoming increasingly common, whether through state or privately-owned 
closed circuit surveillance cameras, location tracking made possible by GPS chips embedded in virtually all 
cellular phones and many other electronic devices, license plate recognition systems, or even by cameras 
wielded by many of the average people on the street and built into ubiquitous technologies like phones, 
tablets, and computers (Moore, 2010; Rushin, 2011). In the public spaces of the Internet, our 
communications, browsing histories, buying patterns, and information about our social networks are subject 
to acquisition by government agents for law enforcement or national security purposes. In our modern 
society, public spaces are increasingly laden with organizational surveillance, where corporations, 
organizations, or governments are the surveillance agents, and non-organizational forms of surveillance 
carried out by individuals (Marx, 2005). Virtually all of this surveillance encompasses metadata, or 
information about the various bits of digital information being created to document our public or private 
lives, and much of this information is being ingested into, and stored in large electronic databases that are 
shared with government agents and marketing companies interested in mining information about us — 
including the attendant metadata — to achieve their respective mandates. 

Recent revelations about covert government surveillance practices in the United States, and in 
allied countries like Canada and the United Kingdom, have vigorously renewed public discussions about 
information and communication privacy. Because of the nature of the surveillance practices at issue, and 
the legal frameworks undergirding government action, many of these discussions have specifically focused 
on whether a person can maintain a legitimate expectation of privacy in metadata — or so-called “non- 
content” information — attached to their electronic activities such that the targeted (or even incidental) 
acquisition of related metadata by government agents should be subject to heightened legal protections. In 
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the United States, our presence in a public space (including online spaces) has generally equated to a waiver 
of any legally enforceable right to privacy for anything we do or say in those places — or in information 
about our physical location — on the premise that such information has been voluntarily disclosed to third 
parties by virtue of our very presence in public itself. 

In this paper, we critically examine the proposition that government access to metadata should be 
subject to lesser legal standards than the actual content of interpersonal communications (i.e., the actual 
words spoken or written by the parties to a communication). We draw upon theory and literature in both 
law and archival studies, as well as judicial reasoning in relevant legal decisions of U.S. courts. More 
specifically, we argue that because metadata surveillance can be highly intrusive to personal privacy — even 
more revealing than the content of our communications in some cases — and that certain types of metadata 
are inextricably linked with the records of our digitally mediated lives (MacNeil, 2002), legal distinctions 
that draw a line between content and metadata are inappropriate and insufficient to adequately protect 
personal privacy. Of course, metadata will not generally give insight into the actual words spoken (or 
typed) in a communication (and thus, in this sense, is less revealing), but it may likewise reveal information 
that the contents might not, such as the frequency of communication between two individuals or other 
patterns of communication. As such, metadata can be very revealing, and even more so than the actual 
contents if what NSA analysts are concerned about is generally calling patterns, connections, and actual 
contact information (all contained in communications metadata). The high evidentiary value of metadata 
to government law enforcement and national security intelligence operations does provide a counterpoint 
to our argument. To deny law enforcement certain surveillance powers solely because of their efficacy is 
likewise inapt. 

However, that critique is misguided and misses the point of our central thesis. Under the 
Constitutional commitments in the United States to personal liberty from government intrusion, including 
the Fourth Amendment’s prohibition on unwarranted search and seizure of personal information by 
government agents, metadata that is inextricably linked to our digital records must be subject to the same 
protections as the records themselves, such as the contents of our communications. Because modern 
technology has “changed the game” (Moore, 2010) by removing barriers to access and utilize the personal 
information of others, the law should similarly adapt and protect informational privacy when there are 
legitimate reasons to do so. 


2 Metadata 


Metadata is most commonly defined as “information about information,” or “data about data.” For our 
purposes metadata is human and machine readable assertions about resources. In our case, resources are 
records in the archival sense, and so come with them particular expectations about metadata. In the context 
of electronic communications, metadata includes information about the time, duration, and location of a 
communication as well as the phone numbers or email addresses of the sending and receiving parties. It 
also may include information about the device used (make/model and specific device identification number). 
Metadata is generated whenever we use electronic devices (such as computers, tablets, mobile phones, 
landline telephones, and even modern automobiles) or services (such as email clients, social networks, word 
processing programs, and search engines). Many of these activities generate considerable amounts of 
information (metadata) about our usage of these devices or services. In most cases, service providers collect 
and retain this information in databases that often can be traced directly to an individual person. The 
migration of those messages to other systems also generates metadata, depicting the provenance of the files 
as they are copied from one server to another. 

For example, when a person makes a telephone call from a personal phone, electronic records are 
created and stored (by the service provider and/or on the device itself) that indicates the phone number 
called, the time the call was made, and the length of the call. Information is also created and stored about 
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the physical location of the device when the call was made. With cellular phones, location can be fairly 
accurately acquired through a variety of methods, including GPS, cell tower triangulation, and the presence 
of nearby WiFi signals (cf. Constandache, et al, 2010). Landline phones, computer initiated calls, and 
cellular phone calls made over WiFi signals can also often be tracked precisely, due to known locations of 
landline connections and Internet IP addresses. For purposes of email, metadata might include the time 
sent, the address of the recipient(s), the size of the file, the existence and size of attachments, and the text 
entered into the subject line of the email itself. The header, visible or invisible to the reader is also part of 
the metadata. 

But metadata is not just associated with electronic communications, it also serves to document 
various properties of other facts, documents, or processes. For example, automated license plate recognition 
systems create metadata about the locations of vehicles at certain points in time. Taking a digital 
photograph often creates metadata about the location the photograph was taken, the aperture, focal length, 
and shutter speed settings of the camera. Word processing programs such as Microsoft Word can also save 
metadata such as the name of the author who created the document, the date of creation, the date on which 
the latest changes have been made, the name of the user who made the most recent changes, the total 
number of words and pages in a document, and the total length of time that a document has actually been 
edited (meaning: an employer could know exactly how much time an employee spent writing and editing a 


memo). 


3 Metadata and Surveillance after Edward Snowden 


After Edward Snowden leaked classified National Security Administration (NSA) documents to the press 
in June 2013, questions about the nature of government collection of communications metadata took a 
prominent place on the world stage. Snowden’s first revelation was a classified court order from the secretive 
U.S. Foreign Intelligence Surveillance Court (FISC) that compelled Verizon, one of the largest U.S. 
telecommunications providers, to provide the U.S. government with all of its customers’ telephone metadata 
on an ongoing basis — encompassing landline, wireless and smartphone communications. Other disclosures 
indicate that virtually all of the major U.S. telecommunications companies were subject to similar orders. 

In a Congressional hearing, top U.S. officials claimed that they were only collecting information 
about numbers of the parties to communications (the sender and receiver of phone calls) and the duration 
of the calls. NSA and Justice Department officials, and high-ranking Congressional representatives, also 
claimed that since they were not collecting the actual contents of communications (e.g. the words spoken), 
the surveillance did not invade anyone’s reasonable expectations of privacy. The officials claimed explicitly 
that they were not collecting geolocation data (e.g. the location of the device when the call was made or 
received), but nothing in the FISC order limited the government from obtaining this kind of information as 
well. Importantly, the U.S. authorities are legally restricted from collecting the actual contents of 
Americans’ communications under the U.S. Constitution (although, as recent practice disclosed in the 
aftermath of Snowden’s disclosures indicates, this may not mean as much in practice). However, 
government agencies are legally permitted to collect the contents (and metadata) of non-U.S. persons around 
the world without any prior judicial authorization. 

If the evidentiary value of a record in the digital environment is defined by its metadata, then we 
have something that is inextricably linked to the record. Without metadata we do not have the record — 
we do not have evidence that is forensically sound and authentic. As in the cases mentioned above, record- 
level metadata is about dates, persons, and locations (MacNeil, 2002). Without these we have no authentic 
evidence, but we can also argue that collection dates, names of persons, and locations is a violation of 
privacy. That is: the context is content. 
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4 Problems with Binary Fourth Amendment Theory 


Legal definitions of privacy in the Fourth Amendment search context have often been crafted to force 
conclusions about potential privacy violations based on binary distinctions: either a form of investigation 
or information gathering by government agents constitutes a search or it does not (Kerr, 2013). The binary 
nature itself is not problematic — in fact it may be highly desirable. However, certain strict application of 
the third-party doctrine and the public/private dichotomy may improperly restrict Fourth Amendment 
protections of personal privacy. 

Traditional trespass-based decisions, recently reinvigorated by the Supreme Court’s decision in 
United States v. Jones (2012), have determined whether a search has occurred on the basis of whether a 
property interest has been infringed by a government agent. The two-pronged Katz reasonable expectations 
of privacy test (which requires that 1) an individual must have exhibited a subjective expectation of privacy 
and, 2) that the expectation must be one that society is prepared to recognize as reasonable or legitimate) 
(Katz v. United States, 1967), despite the allure (or dangers) of its “hypothetical reasonable person” 
standard, has failed to modernize in pace with investigative technologies used by law enforcement around 
the country and remains subject to binary distinctions of legal significance. Fourth Amendment law is 
riddled with binary distinctions granted legal significance by the courts, including the public/private 
dichotomy and the third-party doctrine (or the idea that once information is released to any third-party, 
privacy interests vis-d-vis the government, when acquiring the information from the third-party, are 
waived). 

Indeed, despite calling for empirical evidence (at least on its face) of societal expectations of privacy 
when examining the constitutionality of criminal investigations conducted by government agents, this 
hypothetical reasonable person has rarely (if ever) been a stand-in for relevant social science research on 
what members of the contemporary society actually expect(ed) (see Blumenthal, et al, 2009); rather courts 
have applied the test as a proxy for the work of social scientists and socio-legal scholars. It has been 
suggested that the prevalence of binary dichotomies in Fourth Amendment case law is a consequence of 
courts (and lawyers) attempting to find “easy lines to draw in court” (Selbst, 2013). However, the 
difficulties faced by the courts to apply the Katz test uniformly, problematic application of the third-party 
doctrine in cases involving government use of emerging technologies, and a resounding call by commentators 
that Fourth Amendment legal theory is in chaos (and has been for some time), suggest that the lines may 
not be as easy to draw at all. Perhaps the time has come to rethink Fourth Amendment theory and reduce 
the legal significance of some of the problematic binary distinctions that have plagued court decisions for 
years, such as certain applications of the third-party doctrine that would lessen the privacy interests in 
certain types of metadata. 

In light of the opinions of the Justices in Jones, which signal the possibility that a majority of the 
Justices might be open to revisiting Fourth Amendment theory in light of modern technologically-aided 
police practices (Kerr, 2013), we argue for advancing a normative approach to privacy in Fourth 
Amendment jurisprudence that is sensitive to context (not bound by purely binary distinctions) and the 
increasingly revealing capacity of metadata surveillance, especially when such information is collected, 
stored, and mined in the aggregate. 


5 Defining and Defending Privacy 


Throughout this paper, we define informational privacy as the right to control access to and uses of personal 
information (Moore, 2010; 2007). This definition explicitly recognizes that individuals should have some 
rights to control not just access to personal information, but also some subsequent uses of that information 
(Moore, 2010), even after disclosure to third parties in certain circumstances. This definition will be informed 
by the mosaic theory of the Fourth Amendment (the idea that multiple searches for information by 
government agents, even if each is justified on its own, may become unjustified under the Fourth 
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Amendment by virtue of the greater intrusion made possible by aggregating and analyzing the information 
as a larger set, which may reveal patterns and sensitive information not obtainable through any individual 
search and potentially not relevant to the purposes of the individual searches themselves) recently 
considered in the wake of recent decisions in the United States v. Jones (2012) and United States v. Maynard 
(2010). This version of the mosaic theory, adopted from federal practices attempting to balance disclosing 
documents to the public under the Freedom of Information Act (FOIA) while preserving national security 
interests, is premised on the idea that any individual piece of information is generally less useful than when 
combined with other pieces of information. We argue that a person’s right to limit access to and use of 
certain personal information (e.g. a person’s current or past geographic location) that has not been kept 
strictly “secret” (by virtue of the fact that is was available in a public space) should still, in some 
circumstances, remain legally enforceable under the Fourth Amendment’s guarantee of freedom from 
unreasonable search or seizure. 

In essence, we are arguing for a right to privacy in certain information (specifically metadata that 
forms an essential part of a record about an identifiable individual) that, when viewed discretely or in the 
aggregate is generally not qualitatively or quantitatively available to the public at large (or, as Judge 
Ginsburg of the Circuit Court for the District of Columbia phrased it, such information is not actually or 
constructively exposed to the public (United States v. Maynard, 2010)). The aggregation of the metadata 
associated with our electronic communications and digital records of our physical movements over a 
substantial time period allows law enforcement to easily discover information that is both qualitatively and 
quantitatively different than what is knowingly and voluntarily exposed to the public at large, even though 
it is (in essence) just an aggregation of distinct bits of information individually exposed to the public. 
Tracking a person’s cell phone or logging their Internet browsing patterns also allows the government to 
track individuals while they are inside a private building or in the sanctity of their homes — distinctly 
private information. 

In this pursuit, we will examine the proposition made by Justice Sotomayor in Jones that the time 
has come to rethink the legal significance of allowing a third party access to personal information when 
considering privacy interests in public spaces. By restricting the third-party rule in our Fourth Amendment 
analysis, such that any release of information to a third party is not necessarily a complete and total waiver 
to all forms of access and use by anyone at all, we respect the drastic changes in technological possibilities 
and their proper role in government investigations while maintaining checks on improper abuse of authority. 


6 The Third Party Doctrine 


The third-party doctrine has been described as “the Fourth Amendment rule scholars love to hate” (Kerr, 
2009). For years, it has been subjected to voluminous amounts of criticism, both by legal scholars and state 
courts (Kerr, 2009). The Supreme Court has upheld the rule, holding that citizens “assume the risk” that 
what they disclose to a third party will be transferred on to the government, but has not explicitly defended 
it (Kerr, 2009). And now, after Jones, criticism of the rule has reached the Supreme Court itself. 

In its early years, the third-party doctrine was applied in cases involving undercover agents and 
confidential informants (Kerr, 2009). These cases held that defendants could not claim Fourth Amendment 
violations based off of conversations with government agents — sometimes wearing wires — because the “the 
Fourth Amendment does not protect ‘a wrongdoer’s misplaced belief that a person to whom he voluntarily 
confides his wrongdoing will not reveal it’” (Kerr, 2009, quoting Hoffa v. United States, 1966). In later cases, 
the Court applied the doctrine to business records. In United States v. Miller (1978), the Supreme Court 
held that a bank depositor does not have any reasonable expectation of privacy in financial information (in 
the form of deposit slips, checks, and bank records) because such information was conveyed voluntarily to 
the bank and “exposed to their employees in the ordinary course of business.” As such, the court found 
that, 
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“The depositor takes the risk, in revealing his affairs to another, that the information will be 
conveyed by that person to the Government.... [T]he Fourth Amendment does not prohibit the 
obtaining of information revealed to a third party and conveyed by him to Government authorities, 
even if the information is revealed on the assumption that it will be used only for a limited purpose 
and the confidence placed in the third party will not be betrayed” (United States v. Miller, 1978). 


In her concurrence in United States v. Jones (2012), Justice Sotomayor stated that the time had come for 
Fourth Amendment jurisprudence to discard the premise that legitimate expectations of privacy could only 
be found in situations of near or complete secrecy. Sotomayor argued that people should be able to maintain 
reasonable expectations of privacy in some information voluntarily disclosed to third parties. The opposite 
and historical view of the court, Sotomayor stated, was “ill-suited to the digital age, in which people reveal 
a great deal of information about themselves to third parties in the course of carrying out mundane tasks” 
(United States v. Jones, 2012). Sotomayor considered that logs of phone calls, text messages, websites 
visited, email correspondence, purchase histories from online retailers, and geolocational information were 
all forms of information that were technically disclosed to third parties through mundane tasks, but where 
such disclosure should not constitute waiver of all privacy interests (United States v. Jones, 2012). 
“Whatever the societal expectations,” Sotomayor stated, these forms of information 


“can attain constitutionally protected status only if our Fourth Amendment jurisprudence ceases 
to treat secrecy as a prerequisite for privacy. I would not assume that all information voluntarily 
disclosed to some member of the public for a limited purpose is, for that reason alone, disentitled 
to Fourth Amendment protection” (United States v. Jones, 2012). 


If one purpose of the Fourth Amendment’s warrant requirement is to prevent government agents from 
engaging in fishing expeditions then the third-party doctrine, when applied to aggregate collection and 
mining of metadata, would clearly frustrate the original purpose and intent of the Amendment itself. 

As stated by Justice Sotomayor, the situation with prolonged geolocational tracking is different 
precisely because the technological surveillance “evades the ordinary checks that constrain abusive law 
enforcement practices: ‘limited police resources and community hostility’” (United States v. Jones, 2012, 
citing Illinois v. Lidster, 2004) and allows the government to obtain personal information about individuals 
that is qualitatively and quantitatively different in kind than what would be discovered alternatively. The 
likelihood that, in the case of physical tailing, such a time consuming and resource intensive investigation 
would be carried out regularly without a sound basis is very small. Police are very unlikely to devote such 
time and resources to this kind of visual surveillance except in cases that really warrant it. On the other 
hand, the ease and convenience of obtaining records from wireless providers could allow government agents 
virtually unfettered ability to conduct this sort of surveillance in a wide variety of cases, including “fishing 
expeditions” not based on any level of suspicion (probable cause or otherwise). 

However, this position could potentially limit some important investigations from proceeding as 
efficiently as they might have based purely on departmental lack of resources to conduct extensive visual 
surveillance. But requiring a warrant, based on affirmation of probable cause, before allowing government 
agents to collect and analyze such extensive digital information, should not be a serious impediment to 
most investigations and would help restrict this sort of surveillance to legitimate investigations. 
Additionally, other exceptions to the Fourth Amendment’s warrant requirement, such as the emergency 
doctrine (United States v. Goldenstein, 1972; Roberts, 1975), would continue to ameliorate these concerns 
in practice when time is of the essence. 

However, by limiting a strict application of the third-party doctrine, new questions emerge about 
where lines should be drawn between permissible and impermissible tactics in other contexts. For example, 
what are the important differences (if any) between aggregating geolocational information, bank records, 
“private” communication or messages on a social network like Facebook, web browsing or search histories, 
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or electronic purchase histories collected and archived over time? The mosaic theory, originally announced 
by Judge Ginsburg in United States v. Maynard (2010), may begin to help us sort out these difficult 
questions. 


7 Public Surveillance, the Mosaic, and the Fourth Amendment 


Some scholars have claimed that recent (and even not so recent) advances in digital technologies and 
surveillance capabilities mean that we should rethink whether we can maintain any legitimate expectations 
of privacy while out in public — or in “public facts.” In United States v. Jones (2012), Justice Sotomayor 
proposed that the third-party doctrine should be abandoned (or at least rethought) in the face of confronting 
Fourth Amendment challenges related to investigative use of new technologies. Justice Alito’s separate 
concurrence in that case expressed concern about the robustness of the “reasonable expectations of privacy 
test” — even while advocating its use in that case — because of the potential that the widespread use of new 
surveillance technologies could resign the populace to subjectively expect less privacy than should be 
afforded under the Constitution (United States v. Jones, 2012). 

Indeed, geolocational tracking technologies — which have now been used by law enforcement 
agencies for some time — allow law enforcement to easily compile thousands of pages of information about 
our present and past travels — in very exacting detail — and to mine that information indiscriminately for 
patterns (in United States v. Jones (2012), for example, prosecutors presented over 2,000 pages of data 
about Jones’s location over a 28 day period sourced from a physical tracking device installed in the rear 
bumper of a vehicle Jones regularly drove). The NSA’s metadata surveillance practices, recently exposed 
to greater scrutiny by Edward Snowden, allow the government to conduct similar analysis with the calling 
and communications histories of everyday citizens, even those not suspected of committing any crime. 

Courts have also clearly stated that Fourth Amendment law has failed to keep pace with advancing 
technological possibilities. In one recent Ninth Circuit case, the court stated: 


“The extent to which the Fourth Amendment provides protection for the contents of electronic 
communications in the Internet age is an open question. The recently minted standard of electronic 
communication via e-mails, text messages, and other means opens a new frontier in Fourth 
Amendment jurisprudence that has been little explored” (Quon v. Arch Wireless, 2008). 


In United States v. Maynard (2010) (the predecessor Court of Appeals decision to United States v. Jones 
(2012)), the judge held that the government violated the suspects’ Fourth Amendment rights when they 
tracked a vehicle for 24 hours a day over a 28 day time-period. Importantly, while announcing the “mosaic 
theory”, the court found that: 


“ „unlike one's movements during a single journey, the whole of one's movements over the course 
of a month is not actually exposed to the public because the likelihood anyone will observe all those 
movements is effectively nil... [and] the whole of one's movements is not exposed constructively even 
though each individual movement is exposed, because that whole reveals more—sometimes a great 
deal more—than does the sum of its parts” (United States v. Maynard, 2010). 


The court compared this case of prolonged modern surveillance with prior national security cases where the 
government regularly invoked the “mosaic theory” to shield certain otherwise public records from disclosure 
under the Freedom of Information Act because, “What may seem trivial to the uninformed, may appear of 
great moment to one who has a broad view of the scene” (United States v. Maynard, 2010, citing CIA v. 
Sims, 1985). This concern was later voiced loudly by the Justices in the Supreme Court’s decision in United 
States v. Jones (2012), which upheld the decision of the Circuit Court (but on trespass grounds, rather than 
under the Katz reasonable expectations of privacy test). 
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Combining the third-party doctrine with the modern realities of massive data collection possible 
because of the ubiquitous nature of contemporary communications devices means that location data, even 
historical data, is becoming much easier for law enforcement to obtain without the need to secure a warrant 
supported by probable cause, even without planting physical devices and risking committing physical 
trespass. Indeed, the police in Jones did obtain historical geolocation information from Jones’s wireless 
provider, but chose to rely on the data collected through a physical tracking device installed on Jones’s 
vehicle during the trial. The present ability of law enforcement to so easily amass and mine such enormous 
amounts of personal information through simple technological tools and coordination with service providers 
(such as wireless service providers, email providers, or social network service providers) begs an examination 
of current Fourth Amendment theory, the reasonable expectations of privacy test, and the third-party 
doctrine. 


8 Finding a Legal Basis for Metadata Privacy 


Since Justice Harlan announced a two-part test in a concurring opinion in Katz v. United States (1967) in 
1967, whether or not a person maintains a right to privacy — for Fourth Amendment search purposes — is 
based on whether any subjective expectation of privacy maintained by the individual asserting the privacy 
interest is “one that society is prepared to recognize as reasonable” (Katz v. United States, 1967). Generally 
in the United States, courts have found that information released to the public could not be the subject of 
any legitimate expectation of privacy under this test. From 1967 until the United States v. Jones (2012) 
decision in 2012, the reasonable expectation of privacy test largely succeeded the prior focus on whether the 
government has violated a property right, such as by committing trespass, in conducting a search. Justice 
Scalia’s majority opinion in United States v. Jones (2012), however, reinvigorated the trespass doctrine for 
searches where physical trespass had occurred, while allowing for the continued use of the Katz test when 
non-trespassory interests are allegedly violated. 

Despite the radical shift that some of the dicta in the United States v. Jones (2012) decision might 
indicate for future of Fourth Amendment doctrine, Justice Sotomayor’s call for greater protections for some 
activity occurring in the public sphere is not the first time the idea has been suggested in the courts. In 
the Katz v. United States (1967) decision itself, Justice Stewart stated that 


“What a person knowingly exposes to the public, even in his own home or office, is not a subject 
of Fourth Amendment protection. But what he seeks to preserve as private, even in an area 
accessible to the public, may be constitutionally protected” (Katz v. United States, 1967). 


In that case, the government had placed a listening device to the exterior of a public phone booth, and had 
recorded the defendant making phone calls. The court found that Katz maintained a reasonable expectation 
of privacy in his conversations while inside the phone booth, even though it was in a public place, because 
the court felt that 


“a person in a telephone booth may rely upon the protection of the Fourth Amendment. One who 
occupies it, shuts the door behind him, and pays the toll that permits him to place a call is surely 
entitled to assume that the words he utters into the mouthpiece will not be broadcast to the world. 
To read the Constitution more narrowly is to ignore the vital role that the public telephone has 
come to play in private communication” (Katz v. United States, 1967). 


The court continued it’s “discrediting” of the view that only trespass could raise constitutional questions, 
elaborating that 


“ „once it is recognized that the Fourth Amendment protects people — and not simply ‘areas’ — 
against unreasonable searches and seizures, it becomes clear that the reach of that Amendment 
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cannot turn upon the presence or absence of a physical intrusion into any given enclosure” (Katz 
v. United States, 1967). 


Reading this language alongside Sotomayor’s concurrence in United States v. Jones (2012), parallels begin 
to emerge. The expectation that shutting the glass door to a public phone booth makes the conversation 
private is entirely consistent with the proposition that emails sent to an associate, purchase histories shared 
only with the online merchant, geolocational information shared only with a cellphone service provider, or 
a social networking status update visible only to a select groups of friends (due to actively setting and 
maintaining privacy settings to ensure such limited publication), could also be considered legitimate 
contexts where a reasonable expectation of privacy vis-a-vis the government could adhere (Newell, 2011). 

However, the historical reliance on the third-party doctrine would presumably discredit these 
otherwise reasonable expectations merely because the information was disclosed to an intermediary (Google, 
Facebook, Verizon, T-Mobile, Amazon) or a select group of friends. Thus, the government is free to demand 
and subpoena this information from these intermediaries without obtaining a warrant or attesting in court 
to probable cause. However, the “vital role” that the public telephone played in facilitating private 
communication (even in public spaces) in 1967 has been superseded by a variety of electronic wireless 
communications technologies (cell phones, email, text messaging, and private messaging on social media 
websites) that also collect and transmit a wealth of data (such as geographic coordinates) that find no easy 
corollary in the Katz analogy. 

Some lower federal courts have begun to question a strict application of the third-party doctrine as 
well. In 2010, the Sixth Circuit addressed the question of whether the government violated the Fourth 
Amendment when agents compelled an ISP to turn over the contents of the defendant’s emails without first 
obtaining a warrant (United States v. Warshak, 2010). In that case, the Sixth Circuit held that, even though 
the subscriber agreement allowed the ISP to access the contents of its clients’ emails in certain 
circumstances, “the mere ability of a third-party intermediary to access the contents of a communication 
cannot be sufficient to extinguish a reasonable expectation of privacy” (United States v. Warshak, 2010). 
The court found that this conclusion was consistent with the Katz v. United States (1967) holding, because 
the telephone service company in the prior case also had a legal right to listen to phone calls in certain 
cases. 

The United States v. Warshak (2010) court also differentiated the facts in that case from those in 
United States v. Miller (1978), because the third-party ISP was merely an intermediary rather than the 
intended recipient (as the bank was in Miller). Under the rationale in this case, the government could not 
demand the information from the intermediary corporation or service provider, but the conclusion would 
not necessarily extend to information released by the recipients of the communication, such as the email 
recipient or Facebook friend. Whether this was the right result, or merely a step in the right direction, 
remains the subject of some controversy. However, as evidenced by the recent indication by the five 
concurring justices in United States v. Jones (2012) (Sotomayor was the most explicit, but Alito’s opinion 
can also be read this way) that they may be willing to rethink Fourth Amendment theory (Slobogin, 2012), 
the time may be ripe for further challenges to precedent. Indeed, the fact that the United States v. Jones 
(2012) decision followed from the introduction of the mosaic theory in the lower court’s decision signals 
that the justices may be willing to entertain this issue in coming years. 

The recognition of the Court in Katz v. United States (1967) itself of this relationship between the 
Fourth Amendment, private communications, and technological change, provides ample support for the 
proposition that these new forms of private communication (and the variety of additional opportunities 
they provide, both to government and individuals) should be carefully protected as well, preserving the idea 
that new technologies should receive carefully considered protections under the Fourth Amendment. 
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9 Conclusion 


In archival science, context is everything. Metadata provides essential context for many records, especially 
digital records created by electronic communications and the use of digital devices like smartphones, 
computers, and tablets. Context, as provided by metadata, is vital to the authenticity of these records. 
Without understanding where a record originated (when, by whom, where) by reference to certain metadata 
attached to that record, we cannot claim evidentiary or forensic authenticity — we want to understand the 
authenticity of a document so that we might understand the original act or fact. That context, in the form 
of metadata, is for the most part inextricably linked to the digital record, means that a record does not 
properly exist (in an authentic state) without the metadata. 

Artificial legal distinctions between the content of electronic communications and the associated 
metadata do not properly respect the essential connection between these two sources of data. These 
distinctions also obscure the reality that large-scale metadata surveillance and data-mining provide 
government agents with personal information about peoples’ communications that are often just as revealing 
as the actual words spoken — the “content” of a communication. Because metadata surveillance can be 
highly intrusive to personal privacy and because certain types of record-level metadata (including dates, 
persons, and locations) are inextricably linked with the records of our digitally mediated lives, legal 
distinctions that draw a line between communications “content” and metadata are inappropriate and 
insufficient to adequately protect personal privacy. The law should account for these deficiencies, and 
protect record-level metadata with the same protections as content — making metadata surveillance requests 
subject to judicial authorization under the Fourth Amendment’s warrant requirement. After all: the context 


is content. 
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This paper explores participative Social Information Behavior in the educational domain. The goal is to 
capture a picture of current information practices in the Social Web. The focus is on the “places” and 
the scale of the Social Web in the domain, the communication dynamics and structure of communities 
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participative Social Information Behavior is of relevance in the domain: The volume of openly accessible 
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1 Introduction 


The Social Web can be seen as a communication infrastructure that enables and allows for (potentially 
unrestricted) n:m-communication. Its diffusion can be assessed as a fundamental paradigm shift of 
communication patterns of individuals as well as of the society as a whole. Social Media offer the possibility 
to satisfy communication needs which were previously too costly or impossible to be satisfied. Now, 
everybody can be a “media outlet” (Shirky 2008). The Social Web is primarily a leisure-time-based 
phenomenon of predominantly young internet users. Nevertheless, there is an ongoing discussion addressing 
the question, if and how the potentials of the Social Web can be transferred to professional contexts (Shirky 
2010, Evans 2010, Hester 2010). This paper focuses on Social Information Behavior in educational contexts. 
The research motivation is to get insights into the question of how students of education related study 
paths (future educators) actively use Social Media for their own professional educational context. 

Media usage and information practices are in progress. Search engines have changed the way users 
access information. Furthermore, the Social Web has widely expanded the universe of available knowledge 
itself (Ramakrishnan & Tomkins 2007). In addition, the Social Web has created new possibilities for 
personal, communal and public information sharing (Shirky 2010: 173). The adaptation of Social Media is 
very advanced with regard to personal communication and self-expression in Social Online Networks (SONs) 
(Allfacebook.de 2013) and product or service related information behavior (eMarketer 2013). Contrasting, 
the significance and role of the Social Web for professional contexts is often unknown. Therefore, the 
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question remains, if the Social Web is also of significance in the educational domain, as the current state 
concerning places, communication structures and quality of user-generated content is widely unclear. 

This is the starting point of this work. We want to get an overview of the Social Web in the domain 
at hand and also a picture of the communication dynamics and structure in corresponding communities as 
well as insights into the specificities and quality of such online communication. Apart from the scientific 
research interest, results of this investigation should be especially of interest for specialized information 
providers, as they seem to be rather slow in adapting to the current developments. Their content base is 
usually limited to the “old” channels of professional information. Hence, the communicative paradigm of 
the Social Web is barely addressed. Services are still widely restricted to a content delivery metaphor. One 
can argue that these restrictions are necessary and a safeguard to deliver high quality information. 
Nevertheless, it seems possible and plausible that there is a widening gap between users’ information 
practices and the services offered by specialized information providers. The Web, the Social Web and the 
Future Sensor Web (O° Reilly & Batelle 2009) are very disruptive technologies. One needs to know how 
users behave and what they expect in order to be able to adapt to a rapidly changing environment. Here, 
insights into participative information behavior can be seen as a basis on which professional information 
providers can build upon to improve or secure the usefulness and therefore significance of their services in 
the long run. However, as long as aspects like the quality of user-generated content or the role of online 
communities are widely unclear, specialized information providers are barely able to adapt to changing user 
behavior and needs. 

The paper is structured as follows. First, the concept of Social Information Behavior is outlined 
and related to the educational domain. This provides a conceptual basis for this investigation. A draft of 
current research concerning Social Media usage in study related contexts illustrates the relevance of the 
topic at hand. Following that, the research approach and methods are described and first results are 
presented. On this basis, the significance of participative Social Information Behavior is discussed. The 


paper closes with suggestions of possible adaption approaches for specialized information providers. 


2 Social Information Behavior 


Social Information Behavior can be divided into two aspects: Firstly, the receptive use of user-generated 
content as a resource to satisfy information needs. An example is the selection of Wikipedia entries in search 
engine result pages to answer ad hoc information needs. Secondly, active participation in Social Media: a) 
to answer current information needs, e.g. by asking a specific question in a question-answering service or a 
forum and b) to build up knowledge by ongoing conversations with peers in communities. The latter aspect 
can be connected to the concept of “communities of practice” (Wenger & Snyder 2000). 

As described in chapter 1, participation in Social Media is mainly restricted to leisure-based contexts 
(Busemann & Gescheidle 2011). Nevertheless, according to an online panel survey of Gibs (2009) there is a 
segment of online users who see Social Media as a “core to finding new information”. Morris et al. (2011) 
surveyed 624 participants using social network sites and conclude that social networks are valued for their 
ability to provide opinions and recommendations. Kleimann et al. (2008) investigated the use of Social 
Media in educational contexts. Results indicate that a substantial fraction of students use social 
communities to communicate with peers about study related aspects. According to an investigation of 
Selwyn (2009), 4% of Facebook’s “wall activity” of 909 undergraduate students could be related to studies 
and academic aspects. Lee et al. (2012) explored the question of resource selection in academic search tasks 
and state that question-answering services accounted for 5.2%. Results of an interview study of students 
conducted by Hrastinski & Aghaee (2012) show that Social Media are rarely used for academic needs. 
According to the authors, Social Media are primarily used for short answers and questions and for 
coordination of group work in academic contexts. Kim et al. (2011) surveyed 446 undergraduate students 
and found that different Social Media types are used in different information seeking contexts. For instance, 
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social network sites are preferred for everyday life purposes whereas question-answering services are used 
for leisure as well as academic needs. Geist et al. (2012) investigated relevance and quality aspects of 
different types of search engine based search results for education-related information needs. They conclude 
that user-generated content, which represents 21% of the top 20 results, has a lower relevance probability 
than specialized information, but is on a par with results from professional information providers which do 
not have an explicit educational mandate. 

This short overview of investigations in the field denotes that user-generated content and 
communities play an important role in information behavior. In addition, the named investigations illustrate 
the diversity of research questions which can and probably need to be addressed in the field of Social 
Information Behavior. Apart from factors that influence the choice of media, characteristics of use cases, 
communication patterns, and quality aspects (e.g. usefulness of content and outcomes) need to be addressed 
as well. 


3 Research Design 


Interestingly, investigations often restrict their analysis to popular sites like Facebook or Twitter (cp. section 
2). Beyond this, it is often unclear which other “places” “build” the Social Web. This is where this research 
starts. We address the following research questions: 


A) Places and scale of the Social Web in the domain: Which domains and online services form the 
Social Web in the educational domain? How many users are actively engaged or subscribed in 
communication? What is the magnitude or volume of communication? Answers to these questions 
give an impression of the significance of Social Information Behavior in the field with regard to 
audience reach and amount of user-generated content. 


B) Dynamics and structure of communities: How do communities develop with regard to user numbers 
and contribution quantity? How are discussions structured? What are the topics of discourse? Such 
data give an indication of basic characteristics of communities in the field. 


C) Quality, pragmatics and success of communication: What type and pragmatics of information needs 
and answers constitutes online discussions? Are discussions helpful, e.g. do they provide actionable 
suggestions? What is the role and what are the characteristics of socio-emotional facets of 
communication? Insights into such attributes of online discussions allow for a judgment of the 
quality of online conversations with regard to aims, results and sequence of interaction. 


Part A aims for a broad overview of the whole Social Web, B for a statistical assessment of specific 
communities and C for a judgment of threads. Because of the multiplicity of perspectives and the wealth of 
data under investigation, the implementation and employment of research methods and systems that help 
to answer the above mentioned research questions can be seen as a challenge for themselves. Therefore, the 
following arguments are not only of interest concerning measured data and results but also with regard to 
the methodological level. 


4 Approach and Methods 


The subsequent description follows the structure of the research questions. In principle, the methods 
employed, systems implemented and categories drafted can be regarded as to be independent from each 
other. In the analysis, they build upon one another. The output of methods addressing research question A 
partly serves as input data for B etc. The whole design follows a path from broad to narrow with regard to 
the analyzed samples and from general to specific concerning the depth of the analysis. The following 


illustration gives an overview of the research approach. 
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A) Places and scale of the Social Web in the 
domain (focus: the whole Social Web) 


-Identification of relevant websites 


-Preparation of profiles for relevant domains 


B) Dynamics and structure of communities 
(focus: domains and threads) 


-Crawling and saving of data of selected 
domains 


-Statistical analysis of communication 


<— broad 
<«— general 


C) Quality, specificities, pragmatics, success of 
communication (focus: threads and postings) 


-Development of a categorization scheme 


narrow 
specific 


-Intellectual coding of postings and threads 


Figure 1: Research approach 


A) Places and Scale of the Social Web in the Domain 


In order to get an overview of the Social Web in the educational domain as a whole, the investigation 
starts with an identification of relevant websites. Following that, summaries of socio-demographic 
data and communication specific features of identified domains are prepared. To identify relevant 
websites, a set of 45 terms and phrases was used to query Google. The query set consisted of the 
most popular internal and external queries of the German Education Server (bildungsserver.de), a 
popular specialized information provider and search service in the field, as well as of names of 
education related study paths. For each query the top 100 search results were examined and 
pertinent forums, blogs, communities and portals were sighted in order to determine if they were 
suitable for further analysis. The sighted websites were selected for further analysis when the topics 
of these sites (or a subsection of the sites) matched topics of education related study paths and 
contained active discussions which related to educational topics as well as to the field of study. The 
results sample is what we define as the Social Web in the domain. To get an overview of the 
characteristics of this domain specific Social Web, profiles of each selected domain were prepared. 
The profiles encompass data with regard to their number of members, topics and contributions of 
each of these “places”. In addition, qualitative aspects were also evaluated. We specifically checked 
if discussions were moderated, and scanned the domain to identify its specific content focus and 


target group(s). If possible, the users with the largest numbers of contributions were also identified. 
Dynamics and Structure of Communities 


A subset of the identified Social Web, that means domains whose contents and volume of 
communication seemed to be especially interesting, was specified for further analysis with regard 
to communication dynamics and structure. The domains selected were Lehrerforen.de, Paedagogik- 
klick.de and Referendar.de. Here, we implemented a crawler based analysis tool. The tool has been 
developed on a Linux system and is based on the following Open Source software components: 
Cassandra (cassandra.apache.org), Nutch (apache.nutch.org), HTTrack (www.httrack.com), 
MySQL (www.mysql.com), and Nginx (nginx.com). Tools were configured with Java and Perl 
scripts. In the analysis we want to get an insight into the communication dynamics, seen from a 
community lifecycle perspective (Iriberri & Leroy 2009). Therefore, the number of new users and 
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contributions are analyzed over yearly time periods. This gives an insight into the “health” of the 
community. In addition, statistical data with regard to the number of posts per thread, post length, 
number of authors per thread and fraction of threads with only one posting give an indication of 
the basic characteristics of communication. A high percentage of rather short lived threads, or 
threads with a low number of replies can be seen as an indicator of probably shallow and not very 
successful communication which one would not connect with the notion of knowledge building 
communities (Wenger & Snyder 2000). Such an assessment can be underpinned with an analysis of 
the communication allotment of users. One can assume that the higher the mean of contributions 
per author, the deeper the connection to the community. Finally, discussion topics are investigated. 
A frequency analysis of the most frequent topic terms and their concordances gives an indication 
of the topical demands and needs of the communities, delivering insights in how far the discussions 
can be connected to specialized information. 


Quality, Specificities, Pragmatics and Success of Communication 


In addition to the statistical analysis, an in-depth content analysis is important to get insights into 
qualitative aspects, pragmatics and success of communication. In fact, only by triangulating both, 
one seems to be capable of delivering a meaningful picture. As intellectual assessment and coding 
is costly, a subset of 50 threads of two of the domains selected in B and another forum included in 
A will be analyzed for content analysis. Quality analysis of user-generated content is an active 
research field, in which both automatic and manual techniques are employed. For instance Agichtein 
et al. (2008) and Moturu & Liu (2011) aimed to quantify quality of contributions with the help of 
feature, usage or relationship (link structure) analysis based on automatic scoring and machine 
learning rating algorithms. Such methods could be aligned to our statistical approach in B. There 
is also active research based on intellectual analysis of user-generated content. E.g. Savolainen 
(2011) investigated criteria for assessing quality and credibility by a manual analysis of postings in 
Finish internet discussion forums. Results indicate that judgments from other messages could serve 
as an indicator of message content quality. Such mentions are primarily relying on author credibility 
and to a smaller extend focus on the judgment of the content itself. Willemsen et al. (2011) showed 
that there is a strong connection between content characteristics and perceived usefulness of 
Amazon online reviews. The focus of the analysis here is on the single contributions and, beyond 
that, on the development and characteristics of whole threads. Therefore, the categorization scheme 
needs to address the following levels: I) Initiation of the discussion, IT) Course of the discussion and 
III) Outcome of the discussion. Concerning I and II the single contributions form the elementary 
units of the analysis. III is based on the judgment of the thread as a whole. As written, we are still 
in the process of finalizing the coding scheme and analysis system. The categorization scheme is 
developed collaboratively and iteratively by four researchers. The scheme encompasses the following 
categories: 


I. Type of question: fact oriented, personal estimation 


Intent: Problem solving information need, uncertainty reduction, aim for suggestions, aim 


for emotional support 


I. Answer: factual information, opinion, suggestion, further inquiry, gratitude, meta- 


discussion 
Socio-emotional aspect: affirmation, opposition, social support, hostility 
Quality: new topical aspect 


Topic 
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MI. Explicit solution 


As one can see, the coding scheme is complex. Judgments with regard to III (outcome) are also 
dependent on aggregated values of II. The system is tailored to the analysis of knowledge building 
processes. Especially the items new aspect, further inquiry and opposition can be connected to socio- 
cultural (exploration of knowledge, the diminishing of knowledge asymmetries), socio-cognitive 
(cognitive conflicts) and cognitive elaboration perspectives of learning and knowledge building (cp. 
Vygotsky 1979, Piaget 1979, Kop & Hill 2008, Scardamalia & Bereiter 1994). Following four cycles of 
development and testing, the current state of the categories is widely stable. Therefore we think, it is 
worthwhile to present some preliminary results from our analysis with regard to quality, pragmatics and 
success of communication, too. Still, one needs to consider data here is in a tentative stage. 


5 Results 
A) Places and Scale of the Social Web in the German Educational Domain 


The query process to identify and select relevant websites for further investigation of education-related 
online communities revealed that most discussion sources are to be found within forums and question- 
answering services. Blogs and social networks barely appeared within the top 100 search results and thus 
were excluded from the analysis. In order to define the characteristics of the domain specific Social Web, 
profiles from a total of twenty-one German forums were selected (cp. table 1). These forums are exclusively 
or partly focused on users who aim at, are currently undertaking, or have already concluded their education 
related study paths. Question-answering services are for now excluded due to their broad topical scope. 


# of # of posts Moder- Repistcenion for TEN 
AE PA H in oe : only-area 
thousand thousand forum participation 

paedagogik-klick.de 2 72.1 Yes Yes Yes 
referendar.de 21 275.7 Partly Yes No 
lehrerforen.de 15.5 304.7 Yes Yes No 
Ateachers.de 902.7 387.9 Partly Yes No 
grundschultreff.de 1.7 59 No Yes Yes 
fachlehrerseite.de 4 16.1 Yes No No 
lehrerforum.de 3.9 12.8 Yes Yes No 
schule-ratgeber.de 77.4 1 Yes Yes No 
studis-online.de No Data 602.8 No No No 
uni-protokolle.de 216.5 1,908 Partly Yes No 
uni-pur.de No Data 4.7 No No No 
studienservice.de 37 1,000 No Yes No 
studieren-info.de 0.6 1.3 Partly No No 
aradin 0.6 5.5 Yes No No 


verzeichnis.de 
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erzicherinnenausbildung.d 4 7 21.3 No Yes Yes 
e 

sowi-forum.com 13.8 333 Yes Yes Yes 
krankenschwester.de 41.2 309.5 Yes Yes Yes 
pflegeboard.de 34.6 180.6 No Yes No 
e-hausaufgaben.de 212.6 1,422 Yes Yes No 
vorhilfe.de 50.3 No data Yes Yes Yes 
dirk-bechtel.de 1.2 2.4 Yes Yes Yes 


Table 1: Profiles of German forums in the educational domain with focus on education-related study 
paths (February - May 2013) 


The forums are divided as follows: Fourteen forums aim at a broader target group with users of various 
backgrounds within the area of education, while seven forums focus only on students, pupils, and trainees. 

Diverse characteristics can be seen in terms of the number of members, the total amount of topics 
and posts, and the communication principles. The lowest number of registered members is approximately 
600 for studieren-info.de and studiengangs-verzeichnis.de. Contrary, 902,700 members are listed as members 
of 4teachers.de. A significant factor for a membership subscription seems to be the ability to actively 
participate in forum discussions, as 71% of the analyzed forums demand for log-in data in order to post or 
comment. A different approach can be seen for passively accessing forum content. While only 33% of the 
forums include certain topical areas that can be viewed by members only, the majority of the forums’ 
content can be openly viewed or crawled by search engines at all times. 

Generally, the number of members is proportional to the total amount of posts in a forum. Thus, 
forums with a larger scale of memberships have a higher variety of topics and posts in discussion. However, 
there is a discrepancy for four of the analyzed forums, where schule-ratgeber.de shows the highest variation 
with 77,400 members and a total of only 1,000 posts. In this forum, users need to log-in to benefit from 
uploaded lesson plan material in other parts of the website. Overall, there are four websites, which serve a 
material exchange next to their forum activity and demand. Three of these forums indicate a relative high 
number of members compared to the amount of total posts. 

The forum with the highest member activity is 4teachers.de with about 7,000 posts composed by 
its most active member throughout his ten-year membership. The forum’s foci are on topics related to study 
fields and lesson planning, include the possibility to share materials, and facilitate the exchange of 
experiences between students, as well as offering a student mentorship by employed teachers. Contrary, the 
forum studiengang-verzeichnis.de displays a member’s list indicating that the highest number of comment 
contributions is of 12 posts. The forum contains approximately 5,500 posts and displays a total number of 
about 600 members. As there is no registration required for active participation, it is to infer that most 
comments are written by anonymous users. 

An indication for the promotion of a long-term engagement of users is the employment of an actively 
participating moderator team. For fifteen of the twenty-one forums an at least partly intensive engagement 
of moderators can be seen in the discussions. These moderator-supported forums have about four times as 
many registered members compared to the non-moderated forums. 

A more detailed profile inspection of lehrerforen.de, paedagogik-klick.de and referendar.de (cp. Fig. 
2) shows that the three forums are varying in their characteristics: 


e =©Lehrerforen.de with its 15,500 members and a total of 304,700 posts in 35,000 threads aims its 
discussions at students in education-related study paths as well as trainees and teachers. The forum 
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is supported by an active moderator team. It is openly accessible to non-members, however requires 
a registration prior to posting. The top ten members contributed an average of 3,600 posts and the 
main topical foci of the community are based on questions or discussions about teaching degrees 
and examinations, as well as lesson planning, didactic or educational counselling, and personal 
concerns. 


e = ©Paedagogik-klick.de is a comparatively small forum with about 2,000 members and 72,100 posts in 
4,700 threads. The forum includes discussion areas for members only and is more intimate among 
its regular users, who in the top ten have posted about 4,000 posts throughout their membership. 
The forum is strongly moderated and influenced by a group of specialists who contribute to or lead 
discussions. The forum aims at various professional groups within the educational area, such as 
teachers, kindergarten nurses, educators, students and trainees as well as parents. The topics are 
based on education and learning, traineeship and career, expertise discussions, as well as various 
questions related to student lives. 


e = Referendar.de contains of 21,000 members and 275,700 posts in 27,000 threads. Contrary to 
Paedagogik-klick.de, Referendar.de focuses on a narrower target group of students in education- 
related study paths, trainees and young teachers. The forum is only partly moderated, in so far 
that there are six global moderators who supervise the threads but are only limitedly participating 
in the discussions. Forum topics encompass various questions, experience exchange and concerns 
around traineeships in the teaching environment. These not only include lesson planning, 
educational council and organizational matters, but also extend to financial questions, open job 
positions, etc. The forum’s most active members have posted about 3,400 comments. 
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Figure 2: Thread examples for the forums Referendar.de, Paedagogik-klick.de, and Lehrerforen.de 


In total, the 21 forums show a remarkable number of more than 1.6 million registered members (cp. table 
2) and thus, along with an unknown number of anonymous forum users, play a significant role in the 
German educational domain in the Social Web. 
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# of forums 21 


A imat f members i 
pproximate # of members in 1,637.3 
thousand 


Approximate # of posts in thousand 6,920.4 


Moderated forum Yes: 11 No: 6 Partly: 4 
Registration for active participation Yes: 16 No: 5 
Members-only-area Yes: 7 No: 14 


Table 2: Overall figures of German forums in the educational domain with focus on education related study 


paths 


The open access of forum topics for non-members and the possibility to crawl forum content through search 
engines supposes that forum content within the educational domain is not only perceived by members but 
widely accessible and relevant to non-members who have the opportunity to satisfy their information needs 
by passively consuming information. As a whole, approximately 6.9 million posts (cp. table 2) concerning 
topics related to educational themes are publicly available on the Internet, thus, the Social Web is highly 
significant for education-related information behavior. 


B) Dynamics and Structures of Communities 

This section presents the results of the statistical analysis of the three selected domains Lehrerforen.de, 
Paedagogik-klick.de and Referendar.de. After several months of implementation and fine tuning of the 
crawler, the final crawling was executed in June and July of 2013. To avoid duplicate content, filters were 
employed to exclude e.g. print-views. In addition, only selected sub-forums were indexed. Inclusion criteria 
were based on an explicit study or student-related focus of the sub-forums. Thus, for example, on 
Paedagogik-klick.de the sub-forum “chats-The world outside of education!” was excluded. Another example 
is the sub-forum “School leadership and management” on Lehrerforen.de. As a result, the sample of the 
statistical analysis is the subset of the content which is explicitly on-topic and aligned to the target group 
of students of education-related study paths. The following table gives an overview of the sample. 


Lehrerforen.de te Referendar.de Sum 
Topics 4,189 1,092 1,429 6,710 
Posts 34,657 9,530 11,284 55,471 
Active Users 2,901 470 1,693 5,064 


Table 3: Sample overview 


The data indicates a substantial amount of on-topic communication and also a significant number of 
contributors who actively express information needs or participate in problem solving and knowledge 


generation. 
Communication dynamics 


Crawling data shows that the communities already started years ago and therefore could be assessed as 
established domains. The oldest entry in the crawling index is from Lehrerforen.de, posted on October 25, 
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2002. The oldest post indexed from Referendar.de is dated on March 29, 2005. The oldest entry from 
Paedagogik-Klick.de was posted on January 7, 2007. 
The following illustration gives an overview of new postings and participating authors per year. 


Lehrerforen.de Padagogik-Klick.de 
7.000 - 7.000 
6.000 - 6.000 
5.000 - 5.000 - 
4.000 4.000 - 
2:000 3.000 
2.000 
2.000 
1.000 
0 1.000 
YO bh Oo © NV o_O AD AY AD a 05 a 
SYS? DP? PB BPG OY Oy 
pap ae a E a ee 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 
—Lehrerforen.de Postings ———Lehrerforen.de Users —— Paedagogik-Klick.de Postings ———Paedagogik-Klick.de Users 
Referendar.de 
7.000 
6.000 
5.000 
4.000 
3.000 
2.000 - < f 
0 T T T T T T T 
T O & © © di b&b OA DO NS DV YD 
OY DD QD? D DD HN’ oy ” &” dy 
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——Referendar.de Postings ——Referendar.de Users 


Figure 3: New postings and participating authors per year of the forums Referendar.de, Paedagogik-klick.de, 
and Lehrerforen.de 


Data indicates that all three communities are beyond their peak of participation. In accordance with 
Iriberri’s & Leroy’s (2009) lifecycle perspective, one can argue that the communities are somewhere between 
a mature state and death. The reasons for the decline of participation are unclear. Probably there is a shift 
to other services or types of social websites, e.g. Social Networks. However, we do not want to paint a too 
bleak picture here. Still, there have been plenty of new contributions and new authors throughout the recent 
years. Nevertheless, one could recommend some suggestions to the operators of these domains, as argued 
by Iriberri & Leroy (2009). Operators could initiate events, provide permeated control and rewards, and 
establish and support new subgroups in their communities. 


Communication structure 

An analysis of the communication structure delivers first insights into the quality of discourse and knowledge 
building processes. A first summary of the data denotes substantial communication and exchange which 
can, at least on the surface, be connected to problem solving and knowledge building. The table below gives 
an overview of the communication structures within the three forums. 
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Paed ik- 
Lehrerforen.de Ky pete Referendar.de Sum SD 

klick.de 
Posts per topic 8.3 8.7 7.9 8.3 0.3 
Posts per user 11.9 20.3 6.7 13.0 5.6 
Tenete Oh pesvnes an 91.8 88.5 80.3 88.9 48 
words 
Fraction of threads with 
one posting (“dead 9.3% 16.6% 13.9% 11.5% 
threads”) 
Authors per thread 5.0 3.6 4.3 4.6 0.6 


Table 4: Overview of communication structures 


With a mean of roughly eight posts per thread over all forums, one can assume that communication exhibits 
some substance. Postings, too, are not only short statements. Instead, with a mean of 80-90 word length 
per posting they usually encompass several sentences. Concerning the number of authors per thread, it is 
to state that, in general, multiple perspectives are involved in communication. Although we get a relatively 
uniform picture across all three forums at large, here, and again with regard to posts per user and “dead 
threads”, results indicate some structural communication differences between the three forums, too. Over 
all forums, the number of posts per user hint to a relatively substantial participating level. In addition, the 
fraction of “dead threads” implies that in most cases the initiation of communication is successful. The 
number of authors per thread shows that there is a multiplicity of users and therefore perspectives involved 
in communication. The visible differences hint at a deeper community involvement of members of 
Paedagogik-klick.de, as the number of posts per author roughly doubles the quantity of contributions from 
members of Lehrerforen.de and is even three times higher in comparison to Referendar.de. In contrast, the 
initiation of discourse has a much higher probability of failure on Paedagogik-klick.de than in the other 
forums. We see this overview just as a first plunge into communication structure analysis. One can easily 
see the limits of the employed assessment. Firstly, there is no baseline for orientation. We think this work 
can be a start to build one. Secondly, more in-depth analysis and segmentation is needed. Here, more 
elaborate patterns of analysis are currently developed. Thirdly, the data can barely stand on its own and is 
not self-explaining. Triangulation with more in-depth qualitative data, as proposed in chapter 4c can lead 
to better insights into quality aspects and success factors of such communication. 


Topics 

Finally a frequency analysis of the top 100 terms of thread topics for the three communities implies a topical 
alignment towards professional aspects of (education for and work in) the education system (teacher, school, 
exam) and teaching subjects (Math, History, English, German). The titles indicate that the pragmatic of 
most discussions is either focused on the factual clearance of knowledge gaps with regard to certain named 
topics (“child literature”), the fostering of one’s own professional development (“teacher or into the 
economy?”) or suggestions to accomplish work-related tasks (“lesson planning”). The following table gives 
a rough draft of the discussion topics. 
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Lehrerforen.de Paedagogik-klick.de Referendar.de 
Referendariat Uni Staatsexamen 
Ref Dipl Bayern 

NRW Frankfurt Examen 
Examensarbeit Austausch StEx 
Staatsexamen Studis Lehramt 
Examen Thread Deutsch 
Englisch Ausbildung Englisch 
Deutsch Erzieherin Ergebnisse 
Thema Kinder Geschichte 
Klasse Bildung EWS 


Table 5: Most frequent topic terms 


The topics of Lehrerforen.de and Referendar.de are very similar and focus on the fostering of one’s own 
professional development. Paedagogik-klick.de, the smallest community, has a broader topic spectrum and 
discussion topics often focus on didactical and pedagogical aspects of learning and teaching. As a whole, the 
topic analysis denotes a strong focus on topics that can be connected to specialized information and validates 
the results from the analysis in 5a. 


C) Quality, specificities, pragmatics and success of communication 


As written, the intellectual analysis with regard to the quality, specificities, pragmatics and success of 
communication is in a kind of pre-test stage. Nevertheless, we decided to present preliminary results to 
show first tendencies and also to reveal if our categorization scheme is able to capture knowledge building 
and the satisfaction of information needs. 

The sample consists of six threads with a total of 60 posts collected from three forums. Two threads 
are selected from Lehrerforen.de, one thread from Referendar.de and three threads from Studis-online.de. 
Studis-online.de was chosen because it deviates from the other forums and also Paedagogik-Klick.de with 
regard to moderation and registration processes. In contrast to the other forums, Studis-online.de is not 


moderated and participation does not require prior registration (cp. table 1). 


Initiation of discussion 

The first interest is on the type of questions. Are they fact-oriented or asking for a personal estimation of 
the community? Data indicates both. An example for a fact-oriented question would be to ask for 
possibilities to use the same certificates for different kinds of academic grades. An example to ask for 
personal estimations would be asking for estimations if one’s personal attributes would fit with being a 
teacher and therefore if one should study an education-related degree. Although these categories are not 
disjunctive, results indicate that both types of questions are usually not interwoven. In one case, a 
participant started a discussion with a narration of his exam nerves and a forthcoming examination date. 
He does not ask an explicit question but just states that he will keep the others informed of his further 
preparations. The second post in the thread showed social support. Then, the initiator expressed his 
gratitude. So, we can see this as an example for seeking emotional support without explicitly asking for it. 
In the whole corpus and also in the subset of the six selected threads for manual categorization, the length 
of the questions (measured in word counts) is far above the mean length of all posts. Thus the questions 
provide a richness of information need context that could be of interest to everyone who provides 
information services. With regard to intent, we detect all categories problem solving information need (#3), 
uncertainty reduction (#1), aim for suggestions (#1), aim for emotional support (#2). Interestingly, we 
found a combined occurrence of uncertainty reduction, aim for suggestions and aim for emotional support 
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and no overlap between emotional support and problem solving information need. Possibly, there is a 
distinction between questions that transport the personal pragmatics clearly and questions that are focused 
on cognitive problem solving. 


Course of the discussion 

With regard to answers, results indicate that all categories are present. Our data shows a mix of information 
(#17), opinion (#25), suggestion (#17), and further inquiry (8). Meta-discussion (#6) took place in half of 
the threads. With regard to gratitude (#4) we get the same picture. Emotional aspects and cognitive 
conflicts are also visible, but on a relatively low scale. We see no case of explicit affirmation and few 
oppositions (#4). This indicates a low level of cognitive discourse in the forums. Social support (#12) is 
more frequent than hostility (#6). New topical aspects (#25) are visible in nearly half of all posts. Therefore 
it seems that communication in these forums can be connected strongly to socio-cultural perspectives of 
knowledge building but only in few cases to socio-genetic perspectives. With regard to topic, fundamental 
changes throughout the discussion could not be detected. Instead, postings often broadened or narrowed 
the subject of the preceding post. 


Outcome of the discussion 

A discussion’s outcome is often unclear. Only in one case, the initiator of the discourse explicitly confirmed 
that he is very grateful and that the discussion really helped him to reduce his uncertainty. Interestingly, 
the discussion still continued afterwards. Therefore, as written, the judgment of the outcome is often not 
directly measureable. Nevertheless, our categorization scheme allows for an estimation of emotional or 
cognitive values of threads. Emotional value can be approximated e.g. with the number of postings and 
authors expressing social support. A cognitive or knowledge building value can be calculated by the numbers 
and proportions of information, suggestions, opinions, further inquiries, affirmations and oppositions. Here, 
evaluation schemes and concrete measures still need to be worked out. Nevertheless, the data of our pre- 
test makes it clear that such an undertaking is principally feasible and very worthwhile in our opinion. 
Hence, further research will concentrate on this area. 


6 Discussion 


Finally, what are the results of this analysis? How can the research approach be categorized? What are the 
consequences for specialized information providers? To answer these questions, we will first summarize the 
data with regard to the research questions. Then, we will provide an estimation of the methods and an 
outlook for further research. After all, we will try to provide suggestions for specialized information 
providers. 

With regard to the research questions, results can be summarized as follows. The investigation 
reveals the places and scale of the Social Web in the domain. Astonishingly, at least as the open Web is 
concerned, Social Online Networks and blogs only play a subordinate role. Visible educational-related 
communication is taking place in forums and question-answering services. Therefore, our analysis of forums 
reveals and describes the openly visible Social Web in the domain (cp. table 1). Here, the sheer amount of 
roughly 1.6 million registered users and approximately 6.9 million postings clearly depicts that participative 
Social Information Behavior is of relevance in the domain. 

With regard to dynamics and communication structure of communities, data indicates that 
communication is substantial and that there is usually a multiplicity of perspectives involved in 
communication. Thus, these forums can indeed be assessed as a kind of knowledge building communities. 
Taking a deeper perspective, the content analysis indicates that knowledge building processes can be rather 


assigned to socio-cultural than socio-genetic perspectives of learning. In addition, socio-emotional aspects 
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play an important role in communication, as users are actively aiming for such support. At the same time, 
in most cases, the outcome of discussions seems to be unclear. 

The research approach taken and methods employed in this investigation need to be assessed from 
two perspectives. On the one hand, we see a great news value. As argued, according to our knowledge it is 
the first attempt to measure the Social Web for a whole domain. The investigation is costly, but does not 
only produce new insights but also “products” that can be of value for information providers. Our overview 
data and crawler architecture can be used as a basis to build a Social Web search engine or Social Media 
monitoring system. Even our statistical analysis and the coding scheme provide new ideas to assess user- 
generated content. By employing a knowledge building perspective, data indicates that our coding scheme 
can be seen as a solid base to develop granular and elaborate measures to gauge the “knowledge building 
value” of threads. This is where our further research will focus on. 

On the other hand, with regard to the statistical and intellectual analysis of domains, threads and 
communities, research instruments are still in a preliminary stage. In case of the in-depth manual content 
analysis, data collection is, at the time of submitting this article, not very advanced and far from being 
finished. Therefore results here are only preliminary. 

Finally, with regard to suggestions for specialized information providers, first propositions were 
already presented in the paragraph above. In our crawl with roughly 55,000 indexed postings, the term 
“bildungsserver” (in English: education server) appears in 53 of them, that is in one of a thousand postings. 
If we keep in mind that the education servers play an important role in the infrastructure of specialized 
information providers in Germany in the field, then it is to conclude that they are disconnected from the 
open Social Web. So, what can be done about this? Maybe it is time to get in touch with each other. With 
our data published here, specialized information providers can already grasp the German Educational Social 
Web as a whole. There is also data with regard to influencers available. Concerning strategies of social 
media communication, one can distinguish three possible approaches. The one option which was already 
mentioned is to provide users with an access to the knowledge in the Social Web in the domain, e.g. by 
building up a specialized search engine. Secondly, to get a grasp of users’ needs, trends and developments 
in the Educational Social Web specialized information providers could employ systematic Social Media 
monitoring. Again, our crawler could serve as a blueprint here, but there also is a variety of other tools and 
services available. A third approach would be an active participation in the communication. Providers could 
get in contact with operators and influencers in forums or build up their own specific service-based profiles 


and get involved in communication. 
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Abstract 

From Tehran Square to Gezi Park, Twitter is an emergent tactic of protestors in the public square. Our 
work utilizes the theoretical framework of contentious politics and its human geographic extension as a 
framework for examining the role of “place” in Twitter-based networks of resistance. We examine Twitter 
traffic about local instantiations of Occupy Wall Street across eight cities. The study addresses mutual 
communications between Twitter participants in hashtags related to each of these local instantiations. 
This work explores the role of place as a constitutive component of these networks. To do so, we employ 
descriptive statistical and chi-square tests to examine the significance of user-defined metadata regarding 
place to the exchanges between users within a network. We conclude that place matters and point to 
future directions in computational and traditional qualitative analysis, spatial-temporal studies of social 
media, and the effects of locational propinquity for network development. 
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1 Introduction 


Social media platforms are increasingly utilized as tools to enable and engage in public protest. Twitter is 
one such social media platform that is gaining attention as a tool in protestors’ repertoire of contention, or 
the toolkit of strategies and tactics used to resist oppression (Tarrow, 2011). Beginning with Tunisia in the 
Arab Spring of 2011, and followed by the Spanish indignados, Occupy Wall Street, (Gerbaudo, 2012) and 
most recently protests trending in Turkey, Twitter is used to organize, enable, report protest activities and 
increase the visibility of protester solidarity. In short, Twitter is a place to exchange information (Kwak et 
al., 2010). As an increasingly present condition of protest, services like Twitter are thus of increasing 
importance to the ways in which oppression can be challenged and thus worthy of careful interdisciplinary 
conversation. 

Information scientists and geographers have much to teach one another. Geographic Information 
Science (Goodchild, 1992) is dedicated to the treatment of explicitly spatial information, often at the expense 
of other forms. Information scientists have a stronger grasp on the multitude of ways in which information 
can be treated, but largely leave spatial and place-based interpretations to their geographic brethren. A 
third group, human geographers, would argue that aspects of space and place are inherently more than 
simply the location of a person, but the socio-historical relations between everything that constitutes our 
notions of place. It is time to break down disciplinary walls and begin collaborating. 

Geographic explanations of Twitter have largely relied on traditional geographic information, such 
as the latitude/longitude coordinates embedded in a tweet’s metadata. However, geography is much more 
than Cartesian location. Goodchild (2008) argues that geographic information in the form of Cartesian 
location can be derived from a number of heterogeneous, often confusing, user-generated data sources. 
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Moreover, more qualitative approaches to geographic information science suggest ways in which we can 
spatially consider issues of place not just as Cartesian location, but as the socio-historical construction of 
contested spaces (Knigge and Cope, 2006). Thus geography is more than just “where a person is,” but 
includes one’s relationship to the geographic spaces, cities, neighborhoods, and communities in which we 
live. Unfortunately, socio-historical construction is not a metadata field in Twitter data. 

Human geography, in essence, highlights place as something more than location. An individual’s 
relationship to place is multi-faceted, socio-historical, and not necessarily bound to Cartesian space. These 
relational notions of space open the possibility for considering non-Cartesian modes of geographic 
information, such as a user’s place-based identity. This paper intentionally divorces a person’s relation to 
place from geographic location. We suggest that this is a beginning step in conceptualizing geographic 
relationships within information that are not tightly coupled to coordinate systems. 

The Occupy Wall Street protests (hereafter “Occupy”) are one example of activists utilizing Twitter 
to mobilize, motivate, and acquire resources. Twitter metadata offer both a location (in the form of 
latitude/longitude coordinates) and a self-identified “place” in the user profile data. The formation of 
Occupy networks around place-based identity occurred via hashtags, such as #OccupyDenver. This offers 
a unique opportunity to examine a user’s self-identification with a place in conjunction with their discussions 
about that place. When the place listed in a user’s profile matches the place in the hashtag of their tweet, 
e.g. Denver and #OccupyDenver, we refer to this as place congruence, irrespective of their location. As 
such, examining Twitter in the context of Occupy offers uniquely geographic insights about the use of 
Twitter within social movements in the formation of interest networks (Hemsley & Mason, 2013) related to 
contentious politics. 

This exploratory work examines the role of place in the formation of networks of contention, or the 
structural relationships between persons with a shared interest in urban protest activities. We utilize 
McAdam et. al’s theoretical framework of contentious politics (2001) and its later conceptual extension by 
geographers (Leitner and Miller, 2007; Leitner et al., 2008; Martin et al., 2003) to justify the point of entry 
for our analysis. We employ statistical methodology drawn from social network analysis (SNA) (Wasserman 
and Faust, 1994) to understand what the role of users self-identifying with a place has on their 
communicative practices within these networks of contention. We find that they are, but that the magnitude 
of the effect varies spatially across different urban interest networks. 

Through these findings, we offer the following contributions: 1) previously lacking empirical 
evidence for geographers’ conceptualized links between network structure and contentious politics; 2) 
advancing information analytic techniques to consider place as more than merely latitude/longitude 
coordinates; and 3) an approach for studying within and among multiple Twitter-augmented networks of 
contentious politics. These results provide empirical evidence of place’s role in communicative interactions 
among individuals. This points to exciting opportunities for studies of organizational formation, political 
participation, social network analysis, and the increasing role of social media platforms in the creation of 
the city. 


2 Literature 


Three conceptually overlapping literatures inform our work. First, studies of the geographies of user- 
generated data make heavy use of Cartesian location while theorists push for the extension of this work 
“beyond the geotag” (Crampton et al., 2013). Second, the relationships among individuals in communicative 
networks are examined in network theory and social network analysis. Finally, the theoretical framework 
of contentious politics (McAdam et al., 2001, 1996) and its conceptual extension by geographers demands 
an attention to the relational spaces of networks. Taken together, these literatures highlight the need for 
empirical studies at the intersection of place, communicative networks, and politics of resistance, which we 
address at the conclusion of this section. 


373 


iConference 2014 Jeff Hemsley & Josef Eckert 


2.1 Geographies of user-generated data 


Studies of user-generated geographic information lean heavily on the location of users as represented by 
latitude/longitude coordinate metadata. In geography, theorists are beginning to implore that we move 
“beyond the geotag,” or latitude/longitude coordinates, to consider other ways in which place and space 
are implicated in the creation of user-generated data. 

Geographic information is more than topology, boundaries, and point locations. Goodchild (2008) 
argues that the future of geographic information science lies with the ability to derive geographic information 
from rapidly changing user-generated sources of data. Human geographers are pressing those in this field to 
move “beyond the geotag” (Crampton et al., 2013) to examine representations of place that cannot be 
cartographically represented. To date, nascent research on the geographies of user-generated data almost 
entirely rely on the study of latitude/longitude-based point locations of users. 

Geographers treat Twitter as an instantiation of the “geoweb” or “geospatial web” (Scharl and 
Tochtermann, 2007). This broad rubric posits the geoweb as web 2.0-styled user generated content that 
contains locational metadata, generally in the form of geotags. Empirical studies of the geoweb relate the 
conditions of the user’s location to other attributes of place.The spatial distribution of Google Maps 
placemarks created following Hurricane Katrina mirror the spatial distributions of race brought about by 
deeply inscribed structural inequalities (Crutcher and Zook, 2009). Similarly, research shows that Wikipedia 
editors writing about the global south are frequently located in the global north (Graham and Zook, 2011), 
suggesting that digital representations will reproduce existing arrangements of inequality. The tie here to 
place extends beyond the locations of participants to include the context of those locations. 

Geographically-informed studies of Twitter also exhibit a reliance on latitude/longitude coordinate 
metadata. Geographic information science is quickly integrating Twitter data into its studies. Tweets have 
been used to correlate topic models with the location of fast food establishments (Ghosh and Guha, 2013), 
observed as reproductions of existing spatial, temporal, and socioeconomic patterns (Li et al., 2013), or used 
in mapping the multiple interpretations of Syria with respect to place (Stefanidis et al., 2013). Non- 
geographers have used network analysis to suggest that Twitter @-mention networks are best modeled 
against airplane traffic data (Takhteyev et al., 2012). And within geography, Stephens & Poorthius 
(Forthcoming) relate geographic distance to the strong and weak ties posited by Grannovetter (1973), tying 
geographic studies back into those of network theory. 

But these studies still rely on locational metadata rather than other attributes of place despite the 
flaws inherent in geotagged Twitter data. The accuracy of geotag coordinates varies widely, from several 
meters for GPS and up to several thousand for triangulation via a cellular network (Li et al., 2013). 
Moreover, locations derived from IP lookup are only as accurate as the databases from which they pull data 
(Li et al., 2013). Finally, only 1-2% of tweets are geotagged, thus hardly representative of Twitter traffic as 
a whole. Thus, locational data associated with Twitter is adequate for low-resolution studies but may not 
be appropriate for a conceptualization that treats networks as more granular, relational spaces. Considering 
place rather than location allows us to include contextual information by proxy, such as the relationship, 
history, and meanings with which users self-identify by means of the presentation of a user profile. 


2.2 Network Theory: Where is Place? 


The unfolding of place and sited resistance occurs through the process of “everyday living,” or the multiple, 
often banal, exchanges among actors (Certeau 1984). These processes enable resistance in the face of 
oppressive structures. There are multiple processes by which this occurs, but we argue that the exchange 
of information via Twitter is among them. The city and the public square unfold as places of contention, 
at least in part, through the ties formed among actors communicating about these places via a social media 


platform. 
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Network theorists study the structure and process of these networks — how they form, how actors 
relate, and the emergent social processes enabled by them — through social network analysis (SNA). SNA 
is a methodology that supports the measurement and analysis of network structure (Butts, 2008; Wasserman 
and Faust, 1994). Network structure is defined as “the observed set of ties linking the members of a 
population” (Watts, 2004, p. 48). We conceptualize one set of such ties to be communications between 
Twitter users with the topology of these ties comprising a network structure. Network theorists tie the 
attributes of actors within a network to that network’s structure. For instance, political blogs with similar 
ideological affiliations share homophilious and often identical content (Nahon and Hemsley, 2011, 2013). 
Also, the geographic propinquity and contextual environment of actors are contributing components to the 
formation of networks (McPherson et al., 2001). More recent work on Twitter has extended this work to 
show that communication between Occupy protestors located within the same state tends to be on topics 
of a local nature, and that interstate communication is more often related to the main stream media or 
focuses on a few individuals (Conover, Ferrara, Menczer, & Flammini, 2013; Conover, Davis, et al., 2013). 
This, however, focuses the object of inquiry as locational ties rather than the place-based identity 
characteristics of those that comprise the network. The structural formation of networks is owed at least in 
part to the attributes of the actors within it. 

Durable networks formed around shared interest likely contribute to a place’s capacity for 
contentious politics. Interest networks, or networks that form around topically similar content, can evolve 
into durable relations where actors engage in collaboration and collective action (Hemsley and Mason, 2013). 
While the topic of interest and the related communication network may be ephemeral, some ties among 
actors may have durability beyond the life of the topic. These invocations brought about through actors’ 
“everyday living” around place and protest-based interest networks form durable relationships that 
contribute not only to Occupy-related events, but have the potential to shape future contentious politics in 
a given place. Thus network theory is an integral component of the emergence of social media platforms as 


tools in the repertoire of contention. 


2.3 Contentious Politics and the Geographic Extension 


The theoretical framework of contentious politics (McAdam et al., 2001, 1996; Tarrow, 2011, 1998; 
Tilly and Tarrow, 2007; Tilly, 2003) is ideal for analyzing Occupy Wall Street. McAdam, Tilly, and Tarrow 
separate contentious politics from the study of “social movements” by highlighting the non-institutional 
interactions between groups making interest-based claims. (2001). This framework suggests that the study 
of Occupy Wall Street need not be a study of a movement per se, but conforms more closely to the 
heterogenous claims being made among different Occupy locations. The interaction was certainly non- 
institutional, using occupation and protests alongside of Twitter and other social media platform based 
communications as tools in the repertoire of contention (Tilly and Tarrow, 2007). 

Geographers’ extension of the contentious politics framework highlights the spatial dimension to 
illustrate resistance across difference conceptualizations of place’s role in politics of resistance. This includes 
the comparison of activism across global and local scales (Kurtz, 2003; Martin, 2007), neighborhood or 
community-based organizing (Elwood, 2008, 2006; Martin, 2007), institutional hierarchies (Leitner et al., 
2008; Martin et al., 2003) and place-framing, the legitimization of an agenda by constituting place identity 
through communicative framing (Martin, 2003). Occupy drew inspiration from international resistances 
that formed in the place of the public squares of Tehran and Spain. Occupy was a national phenomenon 
locally instantiated in the individual public squares of their associated cities, but representative of lessons 
learned at the local, national, and global scales. 

Leitner et al. (2008) challenges human geographers to move away from scale as the sole arbiter of 
what makes contentious politics “spatial” and consider other spatial dimensions such as the relational spaces 
of networks. This comes at a time when researchers outside of geography have recognized technology’s role 
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in constructing social structures that can be described in terms of networks (Castells, 1996; Latour, 2005). 
Thus Twitter-exchanges mark a particular bounding of the Occupy Wall Street activities that allow us 
leverage a framework of contentious politics using these communicative, placed networks. The newest 
extensions of literature along these lines encourage us to examine these relational network dynamics as they 
relate to place-framing (Pierce et al., 2011), power formation (Nicholls, 2009), and protest (Castells, 2012). 

The networked nature of protest (Castells, 2012) is alive and well on Twitter. Mainstream media 
lauded the role of Twitter in the “Arab Spring” revolutions earlier in 2011 (Howard and Hussain, 2011; 
Khondker, 2011). Gonzdlez-Bailén et al. (2011) examine the ways protestors are recruited through online 
Twitter networks and Gerbaudo (2012) examined Twitter activity across the Arab Spring, the work of the 
Spanish indignados, and even Occupy Wall Street more broadly. As Gerbaudo (2012) notes, while Twitter 
plays an important role in the development and deployment of networked protest activity it is still greatly 
overshadowed by the work of more traditional protest in popular media. Looking at Occupy specifically, 
Nahon et al. (2013) suggest that Twitter users employed tweeted news, information, and wishes of solidarity 
to protestors within Occupy Wall Street. Twitter acts as a communicative platform-based augmentation to 
networks of contention. 


2.4 Exploring Intersections 


We identify a gap of empirical studies at the theoretical intersection of literature about the geographies of 
user-generated data, social network analysis, and the geographic extended conceptualization of the 
theoretical framework of contentious politics. Studies involving geographies of user-generated data are over- 
reliant on geolocation through latitude/longitude coordinates. Studies with a network-theory based 
approach are frequently treated aspatially. However, geographers studying contentious politics call us to 
consider the roles of space and place in relational networks, as these networks undergird new modes of 
resistance. While we have theoretical framings suggesting that place plays a formative role in digital media, 
there is a lack of empirical work that confirms the role of place in networks of contention. This work serves 
to fill that gap by answering the following research questions: 


1. Networks are made visible through the communicative exchanges among people. If people self- 
identify with a place (on their profile) and then identify a tweet with the same place (a condition 
that we will refer to as place congruence), in what ways does this condition affect their 
communicative practices in these networks of contention? 

2. In what ways is the role of place congruence similar and different among the multiple networks of 
contentious politics of Occupy? 


3 Methods 


We employ statistical techniques in this exploratory study in order to explore the relationship between a 
user’s self-represented place and network formation while drawing from geographers’ conceptualization of 
contentious politics. Each of the parts of our two-level analysis corresponds to one of our research questions. 
For the first question, we examine the role of place within in-network communications by employing a 
network variance model drawn from SNA methods. For our second research question, we treat each network 
as a unit of analysis and employ case study techniques (Yin, 2008). Our networks are comprised of users 
(individuals, shared accounts, or automated accounts) who interacted on Twitter using specific city-based 
protest activity hashtags (e.g. #OccupyDenver, #OccupyHouston, etc.) between October 19th and 
November 19th, 2011. 


3.1 MR-QAP Model 


The interactions among Twitter users and their related attributes (e.g., place) are fundamental to our work 
as factors in the formation of resistance networks. SNA models are used to describe how social distance 
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between actors is related to the degree to which those actors exhibit similar behavior (Christakis and Fowler, 
2011), how relationship types shared by actors influence the exchange of novel information (Granovetter, 
1973) and how connection characteristics relate to influencing behavior (Barash et al., 2012; Brown and 
Reingen, 1987; Gonzdlez-Bailén et al., 2011). Thus, SNA’s focus on the relationships that constitute a 
network is well suited to analyzing the self-representation of place as a constitutive component of these 
networks. 

We use a network variance model to address our first research question. Specifically, we use multiple 
regression quadratic assignment procedure (MR-QAP) as outlined by Krackhard (Krackhardt, 1988, 1988). 
The unit of analysis for this type of regression is the dyad, or pairing of individuals within the network who 
may or may not interact. This is apt for our work as we are exploring whether or not users being in-place 
is related to their linking to each other. Additionally, MR-QAP models have been shown to be robust for 
network datasets where sets of dyads cannot be assumed to be independent (Dekker, Krackhardt, & 
Snijders, 2003; Krackhardt, 1987, 1988). 

In this type of regression each variable is a matrix and the coefficients are estimated using ordinary 
least squares (OLS), and their significance is tested against a reference distribution generated via a Monte 
Carlo permutation of the model’s matrix variables. All of the analysis is done using the SNA package for 
R, which uses the Double-Semi-Partialing (DSP) method for MR-QAP developed by Dekker et al. (2007). 
DSP employs a residual permutation method wherein an initial regression is run to calculate the residuals, 
which are then repeatedly permuted and entered into the model. The resulting coefficients form the reference 
distribution for each estimated coefficient. This approach has been found to be robust against confounding 
variables that might not be included in the model (Dekker et al., 2007). 

Each of the 8 cases in this study is a place-based Twitter communication network that is constructed 
from and limited to tweets with one of these hashtags: #OccupyCincinnati, #OccupyAtlanta, 
#OccupyDenver, #OccupyMemphis, #OccupyHouston, #OccupySLC, #OccupyOrlando, and 
#OccupyPortland. So for tweets to be included in our Denver network, they must contain the hashtag 
##Occupy Denver. These Occupy cities were selected with the intention of representing varied network sizes 
and geographic location to increase reliability. A map displaying the ratio of tweets within a hashtag to the 
total tweets within the dataset is seen in Figure 1. 
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Figure 1: Ratio of tweets within an occupy hashtag to total tweets within entire dataset. 


Each of our networks is further bound by only including tweets that contain an @-mention. An @-mention 
is a feature of the Twitter platform that directs a tweet to another user by placing an @ in front of the user 
handle in the text of the tweet. Users are alerted by the platform when they are @-mentioned by another. 
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@-mentions also occur when a user “retweets,” or shares, another user’s tweets. The text of such a tweet 
contains the prefix “RT” followed by the @-mention of the user whose tweet is being shared. 

By taking a @-mention as a communication between user accounts, we use tweets as communication 
trace data that form the links, or arcs, between users. This is known as an “arc sample method,” which 
takes relations rather than nodes as the sample for analysis (Butts, 2008). @-mention communication 
networks have been constructively employed to study political polarization within Twitter (Conover et al., 
2011). 

We further focus on interaction among users by only including users (and their links) who have 

been mentioned at least once and who have mentioned someone else at least once. This narrows the network 
to those individuals engaging with one another and helps reduce (but not eliminate) broadcast-style spam 
in the datasets. 
We represent the final network as a matrix and capture the direction of the communication (e.g., the 
frequency at which A @-mentions B as well as B @-mentions A). This directed, valued network forms the 
dependent variable in our MR-QAP model. Note that the model specification is the same for each of the 
place-based networks, detailed here: 


Each of the variables is in matrix form and is constructed as follows: 

Y: A directed, valued network comprising links that represent @-mention interactions in a given 
place-based network bound by an Occupy-city hashtag (e.g., #OccupyDenver). 

B 1: The first variable is a dichotomous matrix representing cases where two users are both place 
congruent, meaning their user-defined profile listed place matched the place-based hashtags of their tweets 
(see discussion below for the place related data used to construct this matrix). For example, given users A, 
B, and C, where A & B are place congruent, cells A-B and B-A will contain a 1, while A-C, C-A, B-C and 
C-B will contain a zero. The estimated coefficient, if significant, represents the increase in tie strength (or 
the number of @-mentions) and addresses our first research question. 

B 2: Users who reciprocate in communication have a higher likelihood of communicating in general 
(Steglich et al., 2006). We control for this reciprocity effect with a mutual-tie, dichotomous, symmetric 
network. Cells that contain the integer 1 correspond to users who have @-mentioned one another. Otherwise, 
the cells contain a zero. As above, if the estimated coefficient is significant, the coefficient represents the 
increase in tie strength. 

B3: Twitter users who have large followings produce a disproportionate amount of retweeted 
content (Hu et al., 2012; Kwak et al., 2010). Since retweets are a form of an @-mention, we include this 
variable to control for this effect. In this matrix, the value of the cells in each column are the number of 
followers for the user represented by that column. For example, if user-A has 200 followers, then the cells 
in user-A’s column contain 200. Wasserman and Faust refer to these as expansiveness and popularity effects 
(1994), or more recently, receiver effects (Nahon and Hemsley, 2011). We expect this variable to be both 
significant and positive. The estimated coefficient for this variable can be interpreted by noting that a unit 
change in X3 (followers) increases tie strength in the @-mention network by the estimated coefficient. 

In the model notation E represents the expected value of Y and € represents the model residuals. 


3.2 Quantitatively exploring case similarities and differences 

We explore similarities and differences among networks by treating each network as an independent unit 
of analysis, or case. We utilize a case study approach, outlined by Yin (2008), in identifying patterns related 
to the concept of place congruence. Our case selection supports analytic generalization by representing 
varied network sizes and geographic locations across 8 different Occupy place-based networks. 
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Case study methodology usefully frames our analytical approach. Our explorations make use of 
several heterogeneous techniques. Yin identifies “the case study’s unique strength as its ability to deal with 
a full variety of evidence,” (2008, p.11) and as such offers a methodological framework for this study. This 
analysis utilizes descriptive statistics about each network, data visualizations that foreground comparisons 
among networks, and the variance in the results from our MR-QAP model. Thus a case study approach is 
particularly suited to evaluate multi-modal evidence across network structures. 

Particular to our model, we find the amount of variance that place congruity affords among each 
of the networks. We re-run all of the regression models (see above) without 1 — the place congruent 
variable — and find the difference in the adjusted R2. We employ a partial F-test to verify that the difference 
in variance is significant (Ott and Longnecker, 1993). 


3.3. Data 


Our data is drawn from a corpus of Occupy related tweets collected and maintained by the Social Media 
Lab at the University of Washington (somelab.net). The full data set contains over 160 million tweets 
collected from October 19", 2011 to June 30", 2012 using Twitter’s streaming application programming 
interface (API). The streaming API provides a continuous “stream” of tweets that match a given set search 
terms. As long as one of the terms is matched in the text, hashtags, @-mentions, or URLs of a tweet, 
Twitter streams the tweet to our collection system. Along with the tweet text, Twitter returns metadata 
fields such as a tweet timestamp, the number people who follow the user, as well as the location and place 
of the user. 

Twitter offers two pieces of locational metadata with each tweet if the user changed her settings 
(opted-in) to provide them. First, Twitter’s “location” field is tied to latitude/longitude coordinates (a 
“seotag”). Location is gathered either through granular cell phone tower triangulation, or through GPS. 
Second, Twitter’s “place” field is derived from these coordinates to determine the user’s city, state, or 


country. In the case of a non-GPS enabled device, the “ 


place” of the user is determined by comparing the 
IP of the user’s device against a geo-IP database. This means that our generalization of a “geotag” or 
Twitter’s definition of “place” is not as accurate as frequently portrayed, and that the two cannot be treated 
as spatially equivalent to one another. 

Twitter also allows a user to define her own place as a user-defined text field. While this is not 
necessarily an accurate representation of a user’s physical location (Stephens and Poorthius, Forthcoming), 
we argue that this field is indicative of a user’s self-perceived relationship to a place. People may not reliably 
update their accounts with the most current information, or perhaps identify themselves as living in a 
hometown rather than a temporary location, such as studying at a university. For our work, this is more 
than acceptable. We are interested in relative location as opposed to actual Cartesian location. The user’s 
activities tied to their affective relationship to place remain relevant in the process of that city’s network 
formation. A geotag would likewise not necessarily capture a user’s relational ties to a location (e.g., loyalty 
to a hometown’s sports team or family and friends). 

This Occupy dataset is notable among other Twitter datasets for the proliferation of place-based 
hashtags (e.g. #toccupydenver or #foccupyhouston). We therefore identify users as “place congruent” when 
their user-defined place matched the place-based hashtags of their tweets. So for #occupydenver, if the user 
profile contained “Denver” anywhere in the user-defined place field, that user would be “place congruent.” 

For this exploratory work, we used tweets from October 19" to November 19 that contained any 
of our 8 Occupy city hashtags. Table 1 provides descriptive information for each network, including the 
total number of tweets and users in the dataset, followed by the number of tweets with an @-mention. The 
columns labeled “Nodes” and “Links” provide the number of users and links in the networks after filtering. 
The column labeled “Place congruence” represents the number of users who specifically list a place in their 
profile that matches the network location. To find matches we use a regular expression using the city name. 
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A map displaying the ratio of tweets deemed to be place congruent to the total tweets within a given 
hashtag is seen in Figure 1. 


City \ Descriptives Tweets Users @mentions Nodes Links none 
Congruence 

Cincinnati 1124 560 601 58 60 17 
Memphis 1864 814 1227 86 205 19 

Salt Lake 5937 2027 4629 275 1191 80 
Houston 6114 1700 4268 353 1410 64 
Orlando 7407 2148 4799 293 1241 79 
Atlanta 33773 11897 24611 1671 6294 292 
Denver 52655 12715 40457 2170 14326 219 
Portland 85629 17644 62425 3437 26092 953 


Table 1: Descriptives by City 
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Figure 2: Ratio of place congruent tweets to total tweets within an occupy hashtag. 


3.4 Limitations 


There are a few limitations of this exploratory work we wish to highlight. First, we store the user-defined 
place listed in the profile and the number of followers for each user as derived from the tweet metadata. If 
users in our set tweeted more than once, their location and follower count may change over the course of 
time. In cases where a user changed their location over the course of our study period, we included users as 
place congruent if they matched once. We used the average follower count across all tweets for a given user. 
Second, it is worth noting that a user’s place is not always associated with a particular city, but a 
neighborhood within that city. Likewise, users may use a colloquial reference to their city (e.g., “Mile High 
City” for Denver). Expanding matching criteria for place to include these users as “place congruent” is 
planned in future work. 


4 Findings 
Place congruence is significant in all eight of our models, as shown in table 2. However, the effect is not 
large. In Memphis, for example the place congruence effect only accounts for 0.035% of each link. Our 


380 


iConference 2014 Jeff Hemsley & Josef Eckert 


control variable for mutual ties is also significant in all of the networks, but our control for receiver effects 
(users having a tendency to @-mention people with many followers) was not significant in the smaller 
networks and showed a wide range in terms of its effect. The difference in R2 values between model 1 (the 
full model) to model 2 (without place congruence) is also significant for all networks, as shown in Table 2. 


These values are small, but consistent with the statistical behavior of a variable standing in as a proxy for 

complex processes. 
Place Mutual Receiver Mod 1 Mod 2 

a congruence tie ee Re Adj-R? Adj-R? AAdj-R? 

Cincinnati*** 0.040** 0.985*** -0.591 0.403 0.402 0.395 0.0066*** 

Memphis*** 0.035* 0.970*** -4.065 0.289 0.289 0.287 0.0019*** 

Salt Lake*** 0.037** 0.978*** 0.328* 0.213 0.213 0.207 0.0062*** 

Houston*** 0.016** 0.987*** 2.178* 0.323 0.323 0.322 0.0012*** 

Orlando*** 0.029** 0.987*** 0.556* 0.265 0.265 0.262 0.0024*** 

Atlanta*** 0.004** 0.998*** 0.185** 0.145 0.145 0.145 0.0002*** 

Denver*** 0.014** 0.996*** 0.252* 0.142 0.142 0.142 0.0007*** 

Portland*** 0.003** 0.998*** 0.118** 0.130 0.130 0.129 0.0003*** 

* p < 0.05, ** p < 0.01, *** p < 0.001; + For ease of reading, we multiply these coefficients by 10 million 


Table 2: Findings by City 


Our model, including place congruence, explains a larger amount of variance for networks with fewer nodes. 
Figure 3 plots the amount of variance explained by place congruence against the number of nodes in each 
network. This suggests that our model, on the whole, performs better for networks with fewer nodes. 
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Figure 3: Change in R? by Network Size. Amount of variance explained by place congruent against network 


size. 


We also explored the relationship between place congruent variance and other network measurements, such 
as network density, centralization, and density (Wasserman and Faust, 1994). Place congruence is not 
related to density or centralization in our dataset, whereas networks with larger diameters appear to also 
have a lower place congruence effect. This is consistent with our finding regarding the relationship between 
network size and place congruence. 
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The amount of variance explained by the place congruence variable also changes when measured 
temporally. We divided each place based network into four week-long datasets, and ran our fully specified 
regression on each dataset. Somewhat surprisingly, place congruence was not significant for the first week 
of any network, but remained significant in all networks at the second week and after. Additionally, the R2 
for smallest networks increased over time while for the larger networks it declined (see figure 4). 
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Figure 4: Change in Model 1 R2 by Network Size Over Time. Change in model R2 over time for each 


network. Network size shown as circle size. 


Interestingly, we find that the amount of variance due to place congruence is stable over time. That is, 
while the model on the whole performed more poorly as networks grew and time passed, the performance 
of the place congruence variable stayed consistent over time, with the exception of Salt Lake City and 
Cincinnati. We regressed the model without place congruence against all weeks, for all networks. Figure 6 
provides a plot showing the amount of variance explained by place congruence for each network, for each 
week. 
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Figure 5: Variance explained by place congruence. Change in variance due to place congruence. Network 
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5 Discussion 


We find that in networks of contentious politics, place matters. That is to say place congruence is a 
constitutive component of the formation of resistance networks. The small variance explained by our 
independent variable for users being place congruent comes as no surprise; conceptually, place-congruence- 
as-attribute stands in as a proxy for the myriad of attributes that comprise place, such as demographics, 
interpersonal relationships, history, governmental and community organizations, and any host of other 
unknowns. To relate this back to the theory of contentious politics, it is clear that self-identified place is a 
constituent component of these emergent networks. 

But we also note that the degree to which place matters differs among our city-based networks and 
is related to the size and diameter of the network. Together this suggests that place congruence matters 
more in smaller, less complex networks. This complexity is likely, at least in part, found in the 
multidimensionality of place represented here by a user-defined proxy. 

Place is a surprisingly consistent factor over time, while the role of well-established factors in 
network formation (reciprocation and receiver effects) declines over time. But as the city networks grow 
over time, the models overall explain less variance. This suggests that the networks, and the factors that 
drive their growth, become more complex over time. As the amount of variance due to place congruence 
remains fairly stable, we suggest there may be a core set of actors who remained committed throughout our 
4 week sample. 

Place is more than just a self-representation through a field of metadata. This initial foray suggests 
that seeking further contextual qualities of place would lead to a more robust model. Other modes of inquiry 
will be needed to ascertain the processes through which place asserts itself in contentious politics, but our 
work provides evidence to justify further work in this area. We suggest mixed methods approaches, including 
computational topic modeling and qualitative interviews, for ascertaining contextual qualities of these 
information exchanges that can be attributed to place. Topic modeling will offer us contextual information 
regarding the discussion within a placed hashtag, giving us a way to move beyond rote metadata fields 
through the development of novel mixed methods approaches for including context computationally. Finally, 
examining the ways in which protestors are co-located in place and converse across place-based hashtags 
might further inform our understanding of the exchange of information during protest activities. 


6 Conclusion 


In this paper we argue that place, as conceptualized by human geographers as essentially more than raw 
location to include contextual and socio-historical factors, is a useful analytic focus for researchers in 
information science. To accomplish this, we examined eight city-based Twitter networks, each a local 
instantiation of Occupy Wall Street, through the lenses of contentious politics and network theory. Our 
methodological approach employs social network analysis coupled with a quantitative case study approach 
to explore the role of place in the formation of resistance networks. 

Each network is conceptualized as an independent case, comprised of users who mention each other 
on Twitter and include a city-based hashtag (e.g. #OccupyDenver, #OccupyHouston, etc.) between 
October 19th and November 19th, 2011. Our case selection supports analytic generalization by representing 
varied network sizes and geographic locations across the U.S. This approach allows us to examine the role 
of place within each network as well identify patterns among our networks. 

We find that place is a constitutive component of these resistance networks, but that the amount 
of variance due to place in a network depends on characteristics of the network such as its size and 
complexity. We also find that the effect of place, in terms of variance explained, remains stable over time 
while established factors, such as reciprocation and receiver effects due to users have high follower counts, 
decline in the degree to which they add explanatory power to the model. 
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This work fills an empirical gap at the intersection of human geography, network theory, and 
contentious politics by highlighting the usefulness of social network analysis to geographic studies and by 
demonstrating the importance of place in the formation of networks of resistance. Our methodological 
contribution suggests contextualized and nuanced considerations of place may be fundamental to network 
formation through social media. 
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Abstract 

Libraries are facing the challenge of innovation to meet user needs. Virtual worlds offer unique 
opportunities for curated, immersive, integrated, interactive, and flexible spaces. The Virtual Study Room 
is a unique design concept leveraging these opportunities and supporting information behavior in a virtual 
world. Participant feedback from a research event in Second Life indicates that the Virtual Study Room 
is a useful environment for individual and group information problem-solving, and serves as a model for 
the delivery of online library information services. 
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1 Introduction 


Facing changing patterns of use and demand, many libraries are innovating creative options to better serve 
user information behaviors. Virtual Worlds (VWs) offer immersion, integration, and flexibility in the 
delivery of information services. Through the development of a virtual study room in a virtual world, the 
Virtual Information Behavior Project (VIBE) explored the potential of offering information services as a 
3D, immersive, integrated, interactive, and flexible experience. The Virtual Study Room (VSR) was part of 
a unique research event in Second Life (SL), a social virtual world in which residents interact via groups, 
events, meetings, classes, etc. 

The purpose of the VIBE project is to investigate the nature and patterns of everyday life 
information problem-solving behavior in immersive virtual environments. VIBE researchers learned from 
the first phase of the project that users typically enter SL to participate in groups, attend events, engage 
in activities, and to interact socially with others. Information access, evaluation, use, and sharing behaviors 
are generally by-products of social interactions. Researchers concluded that SL is not yet used or perceived 
as an environment for information problem-solving; that is, addressing the need of seeking, assessing, 
organizing, synthesizing, and sharing information in everyday life situations. 

However, immersive virtual environments such as SL do offer rich potential as information problem- 
solving settings because of emerging patterns of use and expectations (Wasko et al., 2011; Berente et al., 
2011), immersive 3D capabilities, and integration of cutting-edge technologies. Project findings made it 
clear that this potential has not yet been realized, and led to the design and implementation of the Future 
of Information Seeking and Services Exposition (Future InfoExpo). Users and information providers need 
experiences with new affordances. Therefore, this Future InfoExpo was a milestone in enabling users and 
information providers to experience and envision the future of their information practices and to provide 
feedback on their experiences. 

The Virtual Study Room (VSR) exhibit of the Future InfoExpo provided participants a curated, 
immersive, integrated, interactive, and flexible environment for the express purpose of information problem- 
solving. Participants were given a scenario of a specific research task, and asked to think about how they 
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might use the various features and elements of the VSR to conduct their research. This exhibit integrated 
many of the single features of the other exhibits: additional avatar profile detail information; unique, unusual 
tools for information organization; and in-world web access. 

Moreover, the VSR is a design concept offering a new model for library information systems and 
the delivery of services. Libraries, faced with the challenge of meeting user needs in a digital environment, 
typically replicate the model of information service delivery applied to physical libraries (Chow et al., 2012). 
Information seekers and problem-solvers already familiar with VWs are beginning to demand social 
networking and information tools and services that are seamlessly integrated into a 3D, immersive virtual 
environment (Wasko et al.,2011). A growing number of these users are becoming accustomed to the 
affordances of VWs through social exploration (e.g., Club Penguin, Second Life) and gaming (e.g., League 
of Legends, World of Warcraft). The VSR is designed to be a curated, immersive, integrated, interactive, 
and flexible environment, conducive to information problem-solving and suggestive of a new direction for 
the delivery of library information services. 

The questions guiding this investigation ask: 


e How do SL residents view the concept of the VSR in terms of usefulness, ease of use, and their 
likelihood of using one? 
e What would a VSR look like in terms of capabilities and functions? 


2 Background 


Facing the challenge of meeting user needs in a digital environment, libraries typically provide information 
services online that replicate the physical library model. Users entering the physical library expect access 
to a catalog of the library’s collections of print and digital resources. Users also expect access to information 
databases and a reference desk staffed by a librarian. Users could also expect displays of books related to a 
particular author or theme, and scheduled events such as book talks, book groups, career training sessions, 
etc. The typical library presence online does not stray from this model of services. The library home page 
will likely include a catalog for the library’s collection of print and digital resources, links to information 
databases, and perhaps a live chat box for support from a librarian. Online tools may organize resources on 
a particular topic or literary genre (e.g., Libguides, Shelfari), video tutorials may address career and training 
needs, and users may interact via any number of social networking tools to share reading experiences. 
Simple replication, however, fails to support focused study and research, since the associations among 
information elements are dispersed among many separate tools and pages. 


2.1 Information Curation 


Library services can remain effective, not through information provision, but through information curation. 
In a study of the information behaviors of college-age students, Head & Eisenberg (2009) found that students 
did not readily turn to a librarian for help with academic and everyday life-related tasks. Gross (2005, 2007) 
found that college students inaccurately identified themselves as information literate, based in large part 
on their familiarity with locating and accessing information, despite struggling with other stages of the 
information problem-solving process. Historically, limited availability of relevant sources of information 
presented a barrier to successful problem-solving. Successful problem-solving, however, involves not only 
the access of information, but also problem definition, determination of appropriate resources, evaluation 
of source relevance and reliability, comprehension of information, synthesis of information from a variety of 
sources into a coherent response, and evaluation of the product and process (Eisenberg, 2008). Offering 
information services to meet user needs and practices requires that the information service provider have a 
good understanding of how people approach addressing their information needs. Information needs may be 
framed as information problems. Many models exist describing how users approach information problem- 
solving. 
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The field of Library and Information Science offers several models of information problem-solving. 
The Big6 Skills approach by Eisenberg & Berkowitz (1990) describes the stages of successfully solving an 
information problem. This and similar models (e.g., Kuhlthau, 1991; Stripling & Pitts, 1988) provide a 
framework for organizing the many aspects of information behavior as described by Wilson (1999). The 
traditional model for library systems and services focuses on meeting the user at the middle stage of 
information Location and Access, as described by the Big6 model, through connecting to collection catalogs, 
information databases, and reference desk services. Models of the information problem-solving process 
identify the stages before and after the access of new information that are crucial to success. Therefore, a 
guiding principle for the development of a new model for library information systems and services is to 
address every stage of the information problem-solving process. The VSR offers users a space, both 
intellectual and virtual, to meander through these multiple stages. 

The concept of information curation addresses the needs of library users at every stage. Adopted 
from the field of archival studies, information curation is being developed in educational contexts to describe 
the need for students to synthesize, personalize, and share information of various types from various 
channels and sources (Mihailidis & Cohen, 2013). Jenkins (2009) identifies processes that are enabled in an 
environment of real-time and constant information flow: archiving, annotating, appropriating, and 
recirculating. As students engage in these processes to answer questions, complete tasks, solve problems, or 
simply make sense of the world, value is added to this synthesis of information—it becomes a unique mix 
of personal, social, professional, and civic information (Mihailidis & Cohen, 2013). Students become users 
and producers of information and ideas (AASL/AECT, 1998) and engaged in the participatory culture as 
described by Jenkins (2009). Information curation offers the promise of a dynamic and personalized learning 
space that integrates information access, evaluation, use, and sharing tools into one immersive environment. 
This concept of information remix serves as a guide in the crucial development of a new model of library 
information systems and the delivery of services as users become increasingly networked. 


2.2 Virtual World Affordances 


The advent of virtual worlds provides a range of affordances that hold promise for realizing this vision. 
VWs are “3D immersive, computer-simulated environments where users are represented by avatars through 
which they interact in real time with other avatars, objects, and the environment,” (Wasko et al., 2011, p. 
646). VWs are distinct from other online forms of social interaction, such as independent discussion forums 
or web-based social networking tools, in that VWs can bring users to a state of immersion. In this state, 
the user is mentally and/or physically engrossed in the environment to the extent that the user’s self- 
awareness is changed. The related concept of cognitive absorption is a factor determining the quality of the 
user’s experience, and the likelihood of returning to the environment (Goel et al., 2011). Immersive 
environments offer a richer experience than two-dimensional counterparts (Nah et al., 2011). Contributions 
to this enhanced experience include multiple channels of communication; synchronous contact with others, 
objects, and the environment; a distinct set of social cues; and, typically, opportunities for co-creation. Like 
other VWs, SL affords users extraordinary capabilities in an immersive 3D environment. However, unlike 
VWs known as MUDs (multi-user domains) or MMOGs (massively multiplayer online games), SL is not 
based on a win/lose competitive paradigm (Ostrander, 2008). Rather, the bulk of SL activity is in its 
residents’ social interactions through groups, activities, events, and meetings. Therefore, SL is a social 
virtual world. 

SL currently fails to offer seamless integration of tools supporting information access, evaluation, 
organization, and sharing. Users interviewed for the initial phase of the VIBE project described situations 
in which they needed to leave SL in order to access a web browser or other tool, or they attempted to 
maintain focus simultaneously on the access or evaluation of information outside of SL and interaction with 
other users inside of SL, typically failing in the attempt. In order to maintain the sense of immersion in SL, 
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it was necessary to design tools that were seamlessly integrated into the SL environment and adhered to 
the following principles: 1) they supported distributed collaboration in a collocated VW place, such as a 
slideshow presentation that could easily be viewed by all assembled avatars; 2) they maintained super-real 
qualities, in that they appeared as objects and functionalities that were logical within the SL environment 
and afforded 3D-specific properties; and 3) they provided social interaction across platforms, enabling 
integration across other computational and social media applications without needing to exit the immersive 
environment of SL. Other exhibits in the Future InfoExpo provided designs for tools that were seamlessly 
integrated into the SL platform to support these information behaviors and enabled undisrupted immersion. 
The Virtual Study Room (VSR), however, brought many of these tools together to provide a near-seamless 
integrated experience. 

Immersion and integration are principles guiding the design features conducive to information 
problem-solving in a digital environment. Other principles explored in this design study are interactivity 
and flexibility. In SL, users interact via avatar and other communication modes such as text chat and audio 
chat; these modes are all available in the VSR, enabling group information problem-solving. The VSR 
exhibit addresses the problem of limited space in the physical world by investigating the possibilities of 
flexible study spaces in VWs. Space for work and study is precious in the physical world—places to think, 
to carry out various information activities (e.g., search and collect, store and organize, process and create), 
and to meet with others and share. It’s not feasible to have a different space for every major task, project, 
or topic. However, this can be done in virtual space. In VWs, it’s possible to have multiple work or study 
areas—spaces for gathering resources, materials, and tools for a specific purpose, and for keeping those 
things exactly in place so that they may be accessed at the user’s convenience. In a VW, users can have a 
different study room for every major topic, project, or question they wish to investigate. Thus, a room may 
be initially curated around a particular topic, and then offered as an information tool through the library 
for the user to archive, annotate, appropriate, and recirculate as the user sees fit. 


2.3 Libraries in Virtual Worlds 


Virtual environments hold the promise that information problem situations will be facilitated by an 
immersive and flexible environment in which information and communication systems are seamlessly 
integrated. Wasko et al. (2011) note the shift in the Gartner Hype Cycle of virtual worlds to, “a phase 
where real benefits, rather than hyped expectations, are starting to hit the mainstream with potentially 
transformational technologies,” (p. 654). They note anecdotally that typical 10-year-olds are far more 
interested in the avatar experience in virtual environments than in what other people are doing on social 
networking sites. In time, these users and those of the online gaming generation will likely demand changes 
in the way that, “socializing and work occur to the extent that we will most likely see the borders between 
work, play, and learning dissolved or at least be reshaped,” (p. 646). Chaturvedi, Dolk & Drnevich (2011) 
conclude that VWs form a new type of information system, a type that is not yet accurately described by 
current information system design theories, but will become increasingly integral to a comprehensive 
conception of information systems. Livingstone (2011) recently noted: “Through successive waves of hype 
and anti-hype, the educational use of Second Life has quietly, slowly, and gradually developed and grown— 
seemingly impervious to the media din,” (p.62). 

These findings are corroborated by a comprehensive study focusing on virtual world libraries by 
Chow et al (2012). This study found that re-creation of the traditional services model found in physical 
libraries would likely result in ineffectiveness and low usability; virtual users were experience-oriented rather 
than information object-oriented, and expected to experience the full potential that VW technology affords 
in interacting with information and other users. While they identified the distinct information needs and 
expectations for the groups of traditional and virtual users participating in the study, they found that 
traditional users were interested in using the virtual library once they were aware that it existed and saw 
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its potential. Researchers state: “Overall, efficiency, effectiveness, and satisfaction for patrons therefore 
could be considered low for both traditional and virtual patrons” (Chow et al., 2012, p. 504). They conclude 
with recommendations for the design of virtual libraries, which include taking advantage of the 3D, 
immersive, and social affordances of the SL environment, and avoiding duplication of traditional systems 
and services in the VW. They cite the success of two popular SL libraries, the Caledon Libraries and the 
Alliance Virtual Library, as being linked to the focus on community programming and exhibit creation. 
These libraries offer live chat help with an avatar (versus a textual chat box), readings and storytelling 
events using the multiple communication modes available in SL (text, audio, and visual), workshops on a 
variety of topics of interest to users (SL object creation, job-seeking, and social networking), and experiential 
learning opportunities. 

The development of library information systems and the delivery of services in VWs have been 
exploratory to date. Recognizing the potential of VWs for effective systems and services, many public and 
academic physical libraries established a VW presence, designing VW library branches that closely 
resembled their brick and mortar counterparts. The initial phase of the VIBE project revealed that users of 
SL typically do not turn to SL for information purposes. Rather, information is accessed, evaluated, used, 
and shared in the course of social interaction for most users, and much of this information is related to SL- 
specific operations and events rather than factual information related to specific topics like health or politics. 
Moreover, while the proportion of first-time users is high, there are a number of persistent users; that is, 
those who enter SL regularly, have been engaged in groups or events over a long period of time, and 
contribute to the VW community through creation of objects, planning of events, and leadership within 
groups. For these users, a significant amount of in-world information behavior was identified; they identified 
accessing others as sources of information in-world, and accessed textual information outside of SL, thus 
moving in and out of the virtual environment to satisfy their information needs. This was surprising to the 
VIBE project team, given the number of virtual libraries set up in SL. These locations were rarely inhabited, 
and users interviewed during observations in these locations were there for exploratory purposes rather than 
information seeking purposes. Clearly, there is a real opportunity for digital libraries to provide users with 
meaningful interactions and experiences, such as that of a Virtual Study Room. 


3 Method 


This investigation sought to demonstrate that a digital space curated around a particular topic would 
provide users immersion, integration, interactivity, and flexibility — key hallmarks of the virtual worlds 
(VW) experience. We also sought to demonstrate the usefulness of such a space because of its ability to 
support information problem-solving, supporting the processes identified by Jenkins (2009) of archiving, 
annotating, appropriating, and recirculating. The goal was to offer participants a sense of the nature and 
scope of such virtual spaces, demonstrating some of the capabilities, functions, and uses. 


3.1 Study Design 


The Virtual Study Room (VSR) was one of six exhibits of the Future of Information Seeking and Services 
Exposition (Future InfoExpo), an event in Second Life (SL) enabling users and information providers to 
experience and envision the future of their information practices and to provide feedback on their 
experiences. The VSR exhibit (Figure 1) provided participants a curated, immersive, integrated, interactive, 
and flexible environment for the express purpose of individual and group information problem-solving. The 
room integrated in-world information access, evaluation, and organization: tools to access web-based 
information sources, tools to organize information, and 3D artifacts for tagging and manipulation. 
Participants wore a heads-up display (HUD), a unique tool that is visible only to the participant, tracks 
progress through exhibits, and enables assistance from a researcher via chat box. The participants could 
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modify the various information elements in the room; however, the room was not uniquely customizable for 
each individual participant because other participants were simultaneously making changes. 

Participants first viewed a 10 minute video based on a scenario. This VSR focused on medical 
information: “Mike,” a university teacher who has been suffering from chronic back pain, has decided to 
address his problem as a major research task and conduct a thorough investigation in order to find out 
more about the nature of back pain and treatment. Participants were then given a scenario of a specific 
research task, and asked to think about how they might use the various features and elements of the VSR 


to conduct their research. 


Figure 1: The Virtual Study Room immerses users in their information, thus allowing libraries to fully 
realize a robust, comprehensive service delivery platform; Figure 1 illustrates an example of a VSR focused 
on information related to chronic back pain. 


3.2 Data Collection 


Participants were asked to complete a pre-survey after viewing the video in addition to a post-survey after 
experiencing the VSR. Both surveys asked the same questions in the same order to make comparisons 
possible, and offered both Likert-scale and open-ended response options. Both surveys asked participants 
about whether the VSR met expectations, whether it was seen as useful, likelihood of using a VSR, likelihood 
of entering SL explicitly for using a VSR, and opinions on its ease-of-use. A total of 85 participants 
responded to these survey questions. 


4 Results 


132 participants were asked to evaluate the Virtual Study Room in terms of their expectations prior to 
entry, usefulness, likelihood of use if it were available in Second Life, likelihood of entering Second Life 
specifically for the purpose of using the Virtual Study Room, and overall ease-of-use. 


4.1 Expectations 


Many participants reported the introductory video helped them envision a VSR, and they were positive 
about the concept. After entering and experiencing the room, most participants remained positive: 


e 66 


.. so much cooler, Very interactive, I brought a couple of pictures right into the [VSR], and also 
browsed the web too.” 

e “ .. Numerous screens granting the ability to keep many projects open at once. Resources surround 
Mike. He is immersed.” 


“ .. nice way to combine several different methods of research into one area.” 
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Participants highlighted the VSR’s technical ability to support many different information modes 
simultaneously as well as its utility as an information collection, organization, and immersion space. 


4.2 Usefulness 


Most participants saw the room as useful and could see the potential for 3D VWs to support information 
problem-solving: 


e “ .. this gives you the real time capability of chat, communication, and interactivity via avatar to 
make changes and complete documents- many steps above google docs ...” 


66 


e .. useful to have the browser, the notes and the 3D models in once place.” 


e “.. VW are persisitant. [sic] I can leave materials out and not fear that they will be lost or changed.” 


By leveraging affordances of the virtual, the usefulness of the VSR resonated with participants because it 
brought together many separate elements. The persistence of VWs, unlike the churn of the Web, offered a 
convenient and reliable storage capability. 


4.3 Likelihood of Use if Available in Second Life 


After viewing the video, 69% of participants responded “likely” or “very likely” to use this tool, if it were 
available in SL. Encouragingly, 72% of participants were equally positive after experiencing the VSR: 


e “T find it helpful to be able to visualize projects im [sic] working with if i lay out a miniature mock 
up of it.” 

e “It's useful to be able to back and forth from working spaces to designing spaces and vice versa.” 

e “I like to step away from ideas and return to them. Leaving a project in place and easily 
editing/adjusting them in such an environment gives me time to reflect. Also, the value of having 
others "drop by" to give you feedback on work is invaluable.” 


Because information can be highly conceptual and abstract, participants valued the visual and immersive 
aspects of information organization offered by the VSR. Participants also appreciated the way that the 
VSR offered consistent storage methods, which affords both individual reflection and team collaboration. 


4.4 Likelihood of Entry into Second Life for the Purpose of Use 


After viewing the video, a majority of participants (62%) were “somewhat likely” or “very likely” to enter 
SL explicitly to use a VSR. After experiencing the VSR, 72% felt similarly. Qualitative explanations were 
positive and similar to those reported above. Participants offered these insights, however: 


e “ .. the user needs to be inside SL to access it. If the user has an idea while not in SL, they would 
have to remember it, write it down and add it later.” 

e “If it just replicates my existing desk environment, it only offers marginal utility. If instead it can 
short circuit some of the manual steps one has to make in research, then it has unique value.” 


a 


° .. may be in the future.” 


Participants felt that the current VSR design may be too cumbersome to supplant their current practices. 


4.5 Overall Ease-of-Use 


After viewing the video and experiencing the VSR, participants were surveyed concerning ease of use: 83% 
reported that the VSR was “very” or “somewhat” easy to use. When asked if they encountered difficulties 
when using the VSR, participants reflected on the limitations of current technology: 


e “Not familiar with the interface; there's a learning curve.” 
e “The Posted Notes are not easy to move around.” 


a 


° .. other people still managed to be in the way.” 
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e “ .. better if I could see the print more clearly.” 

e “If there were any way to physically draw, like on a real whiteboard (i.e. chemical compounds is 
not easy to make on a keyboard) that would be a plus.” 

e “.. Focus only on what SL can UNIQUELY offer, don't just blindly replicate the RL [real life].” 

e “Elements to facilitate interaction with people doing parallel research ...” 

e “showing the community what is useful ...” 


These comments demonstrate a high level of engagement with the VSR by offering both concrete and 
conceptual ideas for improvement. Participants have a need to communicate visually, and feel limited by 
text-oriented input devices. The social and collaborative potential of the VSR resonated with participants, 
since research can often be a needlessly lonely journey. Participants’ insights and reflections offer strong 
suggestions for researchers and designers of VWs as well as digital libraries. 


5 Discussion 


The first question guiding our investigation asked: 
e What would a VSR look like in terms of capabilities and functions? 


The design for the VSR imagined how that kind of space might function in SL. The capabilities and 
functions included were curated, immersive, integrated, interactive, and flexible to support information 
problem-solving. The VSR provided tools for users to access, evaluate, organize, and share information. The 
Heads-Up Display may be modified to enable real-time assistance from a librarian via in-world chat or as 
an avatar, providing an immersive experience not currently available through “librarian chat” applications. 
Since countless rooms may be curated around a particular topic and made available to users for archiving, 
annotating, appropriating, and recirculating, this model offers greater potential for usability than many 
examples of digital libraries currently available. 

The VSR model may serve as an example for library programs striving to provide information 

systems and to deliver services that users find useful. A library could offer template VSRs based on frequent 
topics of inquiry. These templates could be age-appropriate. For example, a VSR about giraffes for 4th- 
graders need not match the complexity or volume of resources of a VSR about the African savannah 
ecosystems for a high-school or undergraduate student. 
For libraries, VSRs could be curated multi-media collections: articles, books, maps, magazines, web sites, 
links, videos, podcasts, images, etc. All collocated, curated, equally accessible, and simultaneously available. 
Leveraging the virtual space affordances, libraries could offer as many VSRs as requested, customizable by 
topic, grade level, level of experience (e.g., “Knitting for Newbies!”). Content could be updated or revised 
as needed. 

VSRs need not replace traditional ‘Readers’ Advisory’-type displays or services. Rather, VSRs may 
complement these by being customized for specific interests. For example, a general Readers’ Advisory on 
‘Historical Romances’ can be complemented by a VSR on ‘Steampunk Romances with a Strong Female 
Protagonist’ (e.g., Cooper’s St.Croix Chronicles series or Clare’s Infernal Devices trilogy). VSRs have the 
additional benefit of presenting readers with related material: historical-era popular music could be playing, 
or readers could experience era-specific technologies (e.g., penny-farthing bicycles), or their avatars could 
wear historical clothing, etc. In this way, VSR could expose readers to the broader contexts behind many 
library materials. 

The second question guiding our investigation asked: 


e How do SL residents view the concept of the VSR in terms of usefulness, ease of use, and their 
likelihood of using one? 
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The design principles of immersion, integration, interactivity, and flexibility were confirmed by participants 
as appealing and useful aspects of the VSR. Other patterns in the responses from participants became 
evident as well. In conceptualizing new information practices, the introductory video was useful in 
communicating to participants a bold and unusual vision of future information practices and in creating a 
picture of how the space could be used. Participants quickly grasped the purpose and functions from the 
video introduction, and responded favorably both to the video and to experiencing the proof of concept 
exhibit afterward. A similar approach would likely increase the perceived usefulness and effectiveness of 
similar tools offered by a library program. A majority of responses to surveys, both before and after 
experiencing the VSR, was very positive in terms of willingness to use or seek out the VSR, its ease of use, 
and its usefulness. Users first need to know that such capabilities exist, so strategies for promoting and 
demonstrating the value of the VSR must be implemented. 

Specific features were identified as particularly useful. Participants recognized the potential of 
creating objects to embody information, to tag for access, and to manipulate in creative ways. They also 
recognized that this VSR space could be curated around a topic of interest, updated dynamically with 
related information, and visited by collaborators for additional input via multiple communication channels. 
The potential for mixing content, methods of research, and lines of inquiry greatly increased the likelihood 
of serendipitous discovery, a key component of research. 


6 Conclusion 

The VSR, as one of the exhibits in the Future InfoExpo, is important because it demonstrates “proof of 
concept” of new or expanded functions and capabilities in 3D, immersive, virtual environments and offers 
clear directions for future research. Users and information providers experienced, envisioned, and evaluated 
possible information practices in virtual environments and provided feedback on their experiences. Many 
contend that the potential of VWs has yet to be realized (Wasko et al., 2011); VWs will likely transform 
the way we access, evaluate, organize, use, and share information. Kuhlthau (2004) identifies the affective 
dimension of the information problem-solving process; further research is needed to investigate the impact 
a VW environment would have on affect. Further research is also need to determine the efficacy of delivering 
information services via the VSR, and the potential for the VSR as both a personal and collaborative 
information management system. The VSR offers a glimpse of how an immersive, integrated, interactive, 
and flexible information problem-solving environment can support and change information practices, and 
serve as a model for library information systems and the delivery of services. 


7 References 

AASL/AECT. (1998). Information Power: Building Partnerships for Learning. Chicago, IL: American 
Library Association. 

Berente, N., Sean, H., Jacqueline, P., & Patrick, B. (2011). Arguing the value of virtual worlds: Patterns 
of discursive sensemaking of an innovative technology. MIS Quarterly, 35(3). 

Brand-Gruwel, S., & Wopereis, I. (2006). Integration of the information problem-solving skill in an 
educational programme: The effects of learning with authentic tasks. Technology, Instruction, 
Cognition, and Learning, 4, 243-226. 

Chaturvedi, A. R., Dolk, D. R., Drnevich, P. L., & Chaturvedi, A. R. (2011). Design principles for virtual 
worlds. MIS Quarterly, 35(3), 673-685. 

Chow, A. S., Baity, C. C., Zamarripa, M., Chappell, P., Rachlin, D., Vinson, C., Vinson, C. (2012). The 
Information Needs of Virtual Users: A Study of Second Life Libraries. Library Quarterly, 82(4), 
477-510. doi: 10.1086 /667436 


396 


iConference 2014 John L. Marino et al. 


Eisenberg, M. (2008). Information Literacy: Essential Skills for the Information Age. DESIDOC Journal 
of Library & Information Technology, 28(2), 8. 

Eisenberg, M., & Berkowitz, R. (1990). Information Problem-Solving: The Big6™ Skills Approach to 
Library & Information Skills Instruction. Norwood, NJ: Ablex. 

Eisenberg, M., & Brown, M. (1992). Current Themes Regarding Library and Information Skills 
Instruction: Research Supporting and Research Lacking. School Library Media Quarterly, 20(2), 
103-110. 

Gartner, I. (2013). Hype Cycles. Retrieved July 18, 2013, from 
http://www.gartner.com/technology /research/methodologies/hype-cycle.jsp 

Goel, L., Johnson, N. A., Junglas, I., Ives, B., Goel, L., Johnson, N. A., Ives, B. (2011). From Space to 
Place: Predicting Users' Intentions to Return to Virtual Worlds. MIS Quarterly, 35(3), 749-771. 

Gross, M. (2005). The Impact of Low-Level Skills on Information-Seeking Behavior: Implications of 
Competency Theory for Research and Practice. Reference & User Services Quarterly, 45(2), 8. 

Gross, M. (2007). Attaining information literacy: An investigation of the relationship between skill-level, 
selfestimates of skill, and library anxiety. Library & Information Science Research, 29, 21. 

Head, A. J., & Eisenberg, M. (2009). Lessons Learned: How College Students Seek Information in the 
Digital Age Project Information Literacy Progress Report (pp. 42): Information School, 
University of Washington. 

Jenkins, H. (2009). Confronting the challenges of participatory culture: media education for the 21st 
century. Cambridge, MA: Cambridge, MA: The MIT Press. 

Kuhlthau, C. C. (1991). Inside the Search Process: Information Seeking from the User's Perspective. 
Journal of the American Society for Information Science, 42(5), 361-371. 

Livingstone, D. (2011). Second Life is Dead. Long Live Second Life? EDUCAUSE Review, 46(2), 2. 

Mihailidis, P., & Cohen, J. N. (2013). Exploring Curation as a core competency in digital and media 
literacy education. Journal of Interactive Media in Education. 

Nah, F., Eschenbrenner, B., & DeWester, D. (2011). Enhancing brand equity through flow and 
telepresence: A comparison of 2d and 3d virtual worlds. MIS Quarterly, 35(3). 

Ostrander, M. (2008). Talking, looking, flying, searching: information seeking behaviour in Second Life. 
Library Hi Tech, 26(4), 512-524. 

Stripling, B. K., & Pitts, J. M. (1988). Brainstorms and blueprints: teaching library research as a thinking 
process. Englewood, Colo.: Libraries Unlimited. 

Wasko, M., Teigland, R., Leidner, D., & Jarvenpaa, S. (2011). Stepping into the internet: New ventures 
in virtual worlds. MIS Quarterly, 35(3), 645-652. 

Wilson, T. D. (1999). Models in information behaviour research. Journal of Documentation, 55(3), 249- 
270. 


8 Table of Figures 


Figure 1: The Virtual Study Room immerses users in their information, thus allowing libraries to fully 
realize a robust, comprehensive service delivery platform; Figure 1 illustrates an example of a VSR focused 


on information related to chronic back pat, .........ssssssscccccecssssssnsnceccesssssnsnaaecescessessaanaaacescessesnanaaseeseeesegeea 393 


397 


Thinking About Context: Design Practices for Information Architecture with Context- 
Aware Systems 


Jared S. Bauert, Mark W. Newman? and Julie A. Kientz! 
1 University of Washington 
? University of Michigan 


Abstract 

The ubiquity of low-cost, sensor-rich, mobile computing devices has meant that designing context-aware 
systems is now a common concern in Information Architecture. Because of how prevalent context-aware 
systems have become, the way in which practitioners design context-aware systems is of great importance. 
While design frameworks and models have been proposed for context-aware computing systems, there 
has not yet been research that focuses on how designers’ views of context influence Information 
Architecture practices. To address this, we present an empirical analysis of 11 in-depth interviews with 
designers of a variety of context-aware systems. Our analysis of these interviews, along with a review of 
the artifacts produced during the design of these systems, uses the theoretical lens of professional vision 
to illuminate how designers view, use, and account for context in the design process. Our analysis of the 
artifacts and interviews reveals that designers’ perspectives on context adapted as they addressed the 
most salient and timely constraints. This results in the designer shifting between representational and 
interactional views of context rather than having one fixed perspective. This finding suggests that the 
methods of information architects for context-aware systems need to accommodate shifting perspective 
on context. We present details of this activity to contribute insight into the practice of information 


architecture for context-aware systems. 
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1 Introduction 


As the sites and situations in which people interact with information systems change, so change the concerns 
and practices of the people who design them. Recent years have seen a dramatic shift in the nature of 
human-information interaction in which mobile, opportunistic interactions have begun to rival or even 
surpass in number and significance the stationary, focused interactions that characterized the “PC era.” 
Moreover, an increasing number of the applications used to interact with information are “context-aware,” 
meaning they can adapt their behavior based on aspects of the context of use, such as a user’s activity, 
location, social relations, gestures, posture, environment, or affective state. As a result, designers of 
interactive information systems have begun to grapple with how context is accounted for in their work. In 
a sense, designers of information systems have long sought to understand the behaviors and context of the 
populations for whom they design to make systems that support the users’ goals (Beyer & Holtzblatt, 1999; 
Saffer, 2006). However, the goal of such understanding was to create a system that would work well enough 
for most potential users in most potential situations. Context-aware systems, in contrast, seek to proactively 
leverage awareness of the user’s context and adapt accordingly, leading to more complex applications whose 
behavior is generally more difficult to design, prototype, and evaluate than the relatively static systems 
they are replacing. To understand how the field might better support the design of this new generation of 
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systems, it is essential that we understand how designers are currently confronting the challenges inherent 
to context-aware systems. 

While much has been said about the role and nature of context, there has been less emphasis on 
understanding how context is viewed from the perspective of practicing designers. Designers are of particular 
interest in the process of creating context-aware systems because of the role they play in framing how the 
technology is situated in the world and therefore what constitutes the system’s context. Previous literature 
exploring the nature of context has largely been based on a pragmatic approach to what can be detected 
by computers (A. Dey & Abowd, 2000) or on a theoretic exploration of the nature of context (Dourish, 
2004). Recent work has helped to clarify the value that taking a practitioner’s perspective can provide to 
the larger research domain (Goodman & Wakkary, 2011; Stolterman, 2008). To take the practitioner’s 
perspective means that we learn to understand context not as a neutral, objective phenomenon based on 
technology or on theoretical models, but as a construct that reflects the views of designers. 

In this paper, we explore how the designer’s understanding of context is represented in their tools 
and practices while designing information systems and how this influences Information Architecture (IA) 
practice. It is our contention that how designers represent context reflects their concept of context, which 
necessarily influences the design of an information system. By exploring the representation of context in IA, 
we aim to understand the role of the designer’s view of context in their work. In this paper we seek to 
answer the following questions: first, how is context represented in artifacts for IA? What are the 
implications for different methods of representation? And finally, what does this suggest for IA practice as 
we move further into an era of context-aware information systems? 

To answer these questions, we first outline the theoretical lens that we apply to understand context- 
aware design. We draw on Schon’s theory of design worlds to inform our understanding of the conceptual 
space in which design work is conducted. Schön argues that through designers’ perceptions of actual or 
virtual worlds, they create the objects and relationships with which they interact and determine what exists 
in the design world (Schön, 1992). These design worlds are abstract spaces in which designers create and 
evaluate these objects and relations as they work to create an optimal design. Rather than investigating 
the object in the designer’s world, we explore how the designer creates these relationships. Our contention 
is that the designer’s formulation and representation of these relationships necessarily influence the type of 
technology that is suited to exist in that design world. This means that when a designer formulates how a 
new design will respond to context in their design world, it is based on an understanding of context that 
will guide how that design relates to the real world. As Schön notes, these design worlds may be unique to 
the designer or shared in the design community (Schén, 1992). This suggests that explicating these worlds, 
and the behavior inherent to them, could help to establish a common ground to reason about modes of 
context-dependent interaction. To explicate these worlds, we draw on Charles Goodwin’s theory of 
professional vision (Goodwin, 1994), which describes the methods used by members of a profession to shape 
a domain of scrutiny. For example, an anthropologist and a farmer impose different meanings on to the 
same substance (or “domain”) — e.g., soil. Their analyses of the domain rely on different assumptions, 
methods of analysis, and systems of scrutiny. Goodwin argues that this process creates the knowledge that 
forms the theories, artifacts, and expertise that are distinctive to any professional domain. By analyzing the 
methods in the domain of design for context-aware systems, we can begin to illuminate the ways designers 
understand context as a domain of scrutiny. 

To achieve a better understanding of context-aware design practice, we conducted interviews with 
eleven designers of context-aware systems. During the interviews, designers provided us with the artifacts 
produced in the design of one context-aware system they had created (Figure 1). They then walked us 
through the design process detailing the role of the artifacts and practices in which they engaged. This 
allowed us to follow the design process from initial concept to a finished product, all from the designer’s 
perspective. This paper contributes a detailed description of the design processes of context-aware systems 
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from the perspective of the designer. This revealed how designers use of artifacts transforms their 
understanding of context from a more phenomenological perspectives to an increasingly positivist 
representation. It further revealed how the use of artifacts contributes to the generation of a vocabulary of 
codes to describe the contextual components of the system. These findings contribute unique insight to our 
understanding of current design practices. 


Figure 1: Artifacts from Participant 4 showing the progression of the design from an initial sketch, to a 
low-fidelity prototype, and then to a functional prototype. 


2 Related Work 


Our study is situated within a body of literature examining the role of context in IA (Morville & Rosenfeld, 
2008) with special attention paid to the practices of designers. While prior work has investigated the design 
practice of Information Architects (Busch-Geertsema, Balbo, Murphy, & Davey, 2005) this work did not 
consider the role of context in the designers’ work. Beyer & Holtzblatt (Beyer & Holtzblatt, 1999) have 
outlined the importance of understanding context of use for design. Our work differs by focusing on 
information systems that are aware of the their contexts of use and the implications this imposes on design. 
Design methods such as Experience Prototyping (Buchenau & Suri, 2000) and the closely related Speed 
Dating (Davidoff, Lee, Dey, & Zimmerman, 2007), allow designers to rapidly explore application concepts, 
their interactions, and contextual dimensions. While these methods create compelling ways to explore 
context in design work, they do not provide insight into how designers’ are approaching the IA of context- 
aware systems; nor do they provide much guidance into how the understandings of user context are 
captured, represented, communicated, and made relevant to subsequent design activities. 

Our work draws on empirical studies of designers and previous work attempting to characterize 
context. An important contribution to our framing comes from Paul Dourish’s work investigating context 
and its role in design (Dourish, 2004). Dourish’s work outlines the value of viewing context from a 
phenomenological perspective in which context arises from our interaction in the world, drawing on 
everyday, cultural, common-sense understandings of the nature of the social world. He contrasts this 
perspective with positivist accounts of context where it is viewed as a set of attributes of the world that 
can be objectively observed and enumerated. The positivist view of context is represented by Schilit et al. 
who define context as “where you are, who you are with, and what resources are nearby” and “lighting, 
noise level, network connectivity, communication costs, communication bandwidth, and even the social 
situation (Schilit, Adams, & Want, 1994).” Dey expanded the idea of context to include “any information 
that can be used to characterize the situation of an entity (A. K. Dey, 2001).” All three contributions are 
valuable perspectives, but they are formulated as academic positions on the nature of context and do not 
necessarily represent how context is viewed by practitioners designing context-aware systems. 

The value of taking the designer’s view has been an area of considerable interest. In particular, 
design researchers have sought to ensure that work in HCI design theory remains relevant to the larger 
community of design practitioners (Rogers, 2004; Stolterman, 2008). Goodman et al. note that many HCI 
frameworks and theories have had limited impact on professional design practice and asserts that this 
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disconnection reflects the inadequate attention paid to the complexity of design practice (Goodman & 
Wakkary, 2011). Similarly, we hope to extend the research community’s understanding of context by 
providing a detailed account of how context is viewed by practitioners. 

Previous research has supported the design and development of context-aware applications, 
including application toolkits (A. Dey & Abowd, 2000) and infrastructure support (Hong & Landay, 2001) 
to facilitate the rapid development of context-aware applications. Additionally, prior work in design tools 
for context-aware applications have explored the capture and playback of events for conducting user tests 
of context aware systems (Welbourne, Balazinska, Borriello, & Fogarty, 2010) and the playback of real 
world users’ behavior (M. W. Newman et al., 2010). These efforts have produced a number of insights into 
the technical requirements for supporting context-awareness and the potential for easing the burden of 
development, but they have been primarily aimed at software developers who have a different set of 
concerns, practices, and skills than Information Architects. 

To support the design of context-aware systems, researchers have taken several approaches. Prior 
work has sought to use design patterns to support the design of ubiquitous computing systems (Chung et 
al., 2004; Landay & Borriello, 2003). While designers did find the proposed patterns useful, the patterns 
were based on a review of the research literature instead of being informed by observed design practices. 
Dow et al. conducted a series of interviews with designers to investigate design practices for ubiquitous 
computing systems, which is a superset of the context-aware systems that form our focus (Dow, Saponas, 
Li, & Landay, 2006). This work revealed the importance of storytelling in design practices, especially when 
trying to communicate expectations about the context of use. However, Dow, et al.’s work largely focused 
on issues influencing the development of tools to support ubiquitous computing designers rather than the 
designers’ understanding of context. 


3 Study Methods 


We conducted 11 video-recorded interviews with designers who had worked on recent projects that gave 
“special consideration to the users’ context,” as quoted from our recruitment email. We chose this framing 
to capture projects that designers described as being especially context-driven, even if the designers did not 
strive to meet the technical definition of "context-aware" (defined by Schilit, et al. as software that "adapts 
according to the location of use, the collection of nearby people, hosts, and accessible devices, as well as to 
changes to such things over time" (Schilit et al., 1994)). Following prior studies of design practice (Dow et 
al., 2006; M. Newman & Landay, 2000), we focused each interview on a single project on which the 
designer(s) worked in the recent past. We recruited via messages posted to several interaction design mailing 
lists, through personal contacts working in industry or research labs, and by directly contacting designers. 

After conducting 10 interviews, we reviewed the goals of our study and decided to exclude two of 
the interviews from further analysis. These interviews were excluded because the process failed to meet the 
definition of design that we adopted from Preece et al., who characterize design as involving the 
“development of ... a plan or scheme.” (Preece, Rogers, & Sharp, 2003). The methods employed for these 
systems focused too heavily on implementation rather than design, and therefore were not useful for 
understanding design practices. Because we chose to exclude these interviews, we contacted several more 
designers and conducted three additional interviews until we began to see general themes emerge in the 
practices of the designers and therefore had achieved a point of data saturation (Bowen, 2008; Lincoln & 
Guba, 1985). Thus, this paper presents findings from 11 interviews (see Table 1), which included a total of 
14 designers, as two of the interviews (1 and 8) were conducted with multiple designers. We took a broad 
view on what constituted a designer in our interviews, but all 11 interviews were conducted with individuals 
whose practices met Preece’s definition of design (Preece et al., 2003). 
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ID Years of System Description Platform 
Design 
Experience 
1 A 1 Location-aware smartphone application Mobile phone, iOS 
B 2 for locating restaurants 
C 0 
2 12 Wearable location and activity sensing Windows mobile, custom sensor 


smartphone application for logging suite 
physical exercise 
3 10 Small (~1 in2) display tiles capable of Custom-built tangible 
recognizing gestures and proximity of interactive device 
other tiles. 


4 3 Suite of sensors and portable educational Custom-built touch screen 
tool for high school science students device with a suite of sensors 

5 9 On-body activity and location sensing Custom hardware and mobile 
device and smartphone interface phones 

6 5 Tangible device used to promote Custom tangible device 
mindfulness of power consumption 

7 4 Interactive TV and ambient interface for Custom hardware and 
socializing through the television commercial televisions 

8 A 45 Location aware smartphone game. Mobile phone, iOS 

B 6 
9 10 Location-based desktop and smartphone Web applications 


app for healthy lifestyle recommendations 

10 5 Location-based smartphone application Mobile phone, iOS 
for managing time 

11 4 Mobile application for sharing online Mobile phone, iOS 
shopping experience 


Table 1: Participant experience and the systems designed. Interviews 1 and 8 were conducted with multiple 
designers. 


4 Artifact Analysis 


In this section, we present an analysis of the interviews and the artifacts created by participants in the 
design process. To conduct our analysis, we drew on Goodwin’s definitions for three practices that he argues 
help to frame the socially organized ways of seeing and understanding events that are distinct to a particular 
social group (Goodwin, 1994). Specifically, we examined the designers’ work looking for instances of the 
following practices: 


e Highlighting: Making specific phenomena in a complex perceptual field salient by marking them 
in some fashion. 

e Coding: Transforming the materials being attended to in a specific setting into the objects of 
knowledge that animate the discourse of a profession 


e Creating representations: The production and articulation of material representations 


Goodwin argues that these practices characterize any professional domain, and, in fact, how they are 
performed establishes the basis of a profession (Goodwin, 1994). Thus, we do no argue that these three 
practices are unique to design, but that by examining the interviews in terms of these practices, we begin 
to see how designers understand context as part of their profession. In our analysis of the artifacts and 
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interviews, we attempted to remain neutral about what context could mean. Therefore, rather than 
analyzing the interviews by looking for examples of what we believed context to be based on a literature 
review or our personal intuition, we instead looked for representations or practices that are not accounted 
for in conventional computing interfaces (Hutchins & Hollan, 1985) or that relied on implicit interaction 
(Schmidt, 2000). Additionally, we looked for instances where the designer specifically discussed context. 

To analyze the interviews, the researchers viewed the interviews and the artifacts numerous times 
looking for examples of highlighting, coding, and creating representations. Instances where these practices 
were used were then compared across the designers to look for commonalities and differences. Below, we 
discuss the artifacts and practices which were used to explore the information flow of the system including 
items such as wireframes, schematics, and sitemaps (M. Newman & Landay, 2000). We do not claim that 
these artifacts represent an exhaustive grouping of all artifacts that are used for understanding IA of 
context-aware systems. Rather, these are artifacts we encountered that were useful to designers in 
understanding how context and the IA of the applications intersected. Our findings are presented below, 
organized according to the practices of professional vision. 


4.1 Highlighting Context 

Highlighting is the process of making a phenomenon in a perceptual field more salient (Goodwin, 1994). In 
the artifacts we analyzed, we found numerous examples of context being highlighted by designers. To 
facilitate our discussion of how context was highlighted in design artifacts, it is useful to introduce Dobson’s 
work on the subtleties of location (Dobson, 2005). Dobson created a taxonomy of ways that an individual’s 
location can be determined. At the top of his taxonomic hierarchy, Dobson suggests that locations can be 
broken into three categories: known, approximate, and negative. For example you might now that the user 
is at work (known), that they are on their way to work (approximate), or that they aren’t at work (negative). 
Furthermore, Dobson’s taxonomy distinguishes between knowing someone is at work (named space), from 
knowing his or her exact location at work (absolute position). This is only a small subset of terms provided 
by Dobson to describe location, but it helps us begin our discussion of the varied aspects of location that 
might be highlighted by designers. 

The work of the designers from Interviews 1 and 9 helps to illustrate the distinct ways that location 
was viewed, and how this influenced what was highlighted. The designers from Interview 1 developed an 
iPhone application that suggested businesses to the user based on their location. The designers used text to 
highlight location and used a known named space for the location. In the wireframe we can see that the 
system presents results that are near the user’s location “Catherines’ Glen” (see Figure 2). On the side, the 
designers have written a note stating “Cross-street?” suggesting that cross-streets may be a more 


appropriate way to communicate the location to the user (see Figure 2). 
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Figure 2: One frame from Interview 1 wireframes. In this frame, we can see that the designers used a known 
named space to describe the user’s location. 


In contrast to how the designers from Interview 1 highlighted location, in Interview 9, the designer depicted 
location as both an approximate and known location. Designer 9’s work was on a system that aimed to 
provide healthy lifestyle options to users along their commute. Based on the user’s location and destination, 
the system plots a route and then suggests healthy places to eat or exercise along that route. Designer 9’s 
annotation to the wireframe states, “line plots first — dots come after” next to a diagram of a map (see 
Figure 3). The line highlights the location as a route—an approximate location—and the dots highlight 
specific known locations. By highlight both the approximate and known location the designer can use the 
wireframe to depict the activity of commuting as well as the system’s recommendations, thereby allowing 
the designer to preserve multiple aspects of the user’s context. The designer was clearly aware of this 
implication, and stated: 


Designer 9: “In terms of like location and context, that is a trickier one. It is one of those sort of 
underlying things that is part of a design that is just — to me I don’t think of -- when I approach a 
design I don’t think of location and context explicitly, it is part of — it is almost implicit. It is just 
one of those elements that you are going to take advantage of the design or it is the framework 
that you are designing within.” 


Figure 3: Wireframe-flow diagram from Designer 9 shows how the text highlights the way search results are 
driven by user’s commute. 
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The quote and the features of context highlighted in the wireframe provide some insight into how this 
designer viewed context. We can see that the designer clearly viewed context as being multifaceted. Also, 
we can see that he viewed each of the components of context as being interrelated—not separate delineable 
components. This contrasts the view of the designer’s from Interview 1 who only highlighted context in 
terms of the location the user occupied. This difference returns us to Dourish’s work exploring positivist 
and phenomenological perspectives on context (Dourish, 2004). The designers from Interview 1 clearly 
highlighted aspects of context that could be described in positivistic terms, whereas the work of Designer 9 
is better described as phenomenological. The differences between these ways of viewing context are more 
than purely academic; they have implications for how the system is designed as well. The implication of 
these different perspectives is that when—like Designer 9—context is viewed phenomenologically, it creates 
more ambiguity than the positivist perspective. This ambiguity allows the designer to more openly explore 
potential meaning for the system. However, it also creates tension between how the designer views context 
and how they can capture this view in their design artifacts, and ultimately their final design. In the 
following section, we can begin to see how designers’ use of coding enables them to work with this ambiguity. 


4.2 Encoding Context 


According to Goodwin, coding is the process of transforming the domain being attended to into objects of 
knowledge (Goodwin, 1994). We found that coding often drew on the phenomenon designers highlighted in 
their work. What distinguishes coding from highlighting is that highlighting draws attention to features of 
context. Codes define the highlighted features or the behavior of the system in response to those features. 
Our observations of how coding figured into the designers’ practices revealed two prominent themes. First, 
coding allowed for the instantiation of a vocabulary about context. This instantiation occurred when the 
novel aspects of these systems required designers to develop new vocabulary to communicate ambiguous 
aspects of the context. Secondly, these codes were then used as a method to communicate design constraints 
or technological requirements throughout the design process. 

We found a particularly illuminating example in our interview with Designer 2. During the 
interview, she discussed her work developing a smartphone application that would detect and respond to 
the user’s exercise activities, such as running, walking, or riding a bike. She used four different colors of 
sticky notes to represent the screens of the application. The sticky notes of a certain color were used to 
reflect screens that would pertain to the type of context detected (Figure 4). The designer used orange for 
location, blue for time, purple for activity, and pink for “system triggered,” meaning that no signal was 
detected and therefore the system was possibly not being used. Each sticky note represented a UI screen or 
a concept that a screen would need to be designed to accommodate. The designer’s color-coding of the 
context highlights how the screens are organized, and also helps to establish a coding schema (Goodwin, 
1994) for the various components of context dealt with. The codes may seem unremarkable—location, time, 
activity, and no signal—but they were an important enough way to organize the wide range of activities of 
the system that the designer felt it necessary to meticulously organize the information according to this 
coding system. In addition, laying out the information this way enabled the designer to consider if it would 
be capable of detecting the components of context in question. She stated: 


Designer 2: “And see here, you can see in this one there’s a question that we wrote and there’s a 
big [collaborating developer’s name] with a question mark so we would bring him over and there’s 
another [collaborating developer’s name] with a question mark. So [collaborator] was working with 
[different collaborator] and so when we would hit some ‘Ooh is that even possible?’” 
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Figure 4: Designer 2’s sitemap shows how the color of the sticky notes encodes the relevant forms of context 
to which the screens correspond. 


This quote illustrates two interesting points. First, the designer used this framing of context to consider the 
feasibility of the system, which is a necessary in the production of any technology. Second, it shows that 
encoding context in this way allowed her to engage with other collaborators to resolve the issues of 
feasibility. This suggests that the codes became the vocabulary with which the feasibility of context- 
awareness was discussed. Of additional interest is the fact that this artifact persisted in a public space in 
the participant’s lab for weeks, which enabled her to discuss the system at length, thereby helping to solidify 
and propagate the coding schema. 


Designer 2: “So we would bring people to the board constantly and people were fascinated with 
us at the board. I think just having these artifacts and colored things and us just sitting there 
staring brought a lot of people over and they would ask about it, follow the progress.” 


The fact that this artifact operated as a conversational locus suggests that coding is a social process. 
This helps to establish that the codes were useful, not only in reasoning about the system, but also in 
creating a common language to discuss the system. When the designer initially used color-coding to define 
the functional sections of the sitemap, she did not consciously encode a given way to discuss the system. 
However, through color-coding, she facilitated discussions of the system by creating an easy way to see how 
the context of the system would influence the architecture. This allowed her to discuss the implementation 
with the engineers working out how context was detected by the system. The example of Designer 2’s color- 
coded sitemap begins to suggest how codes can help to span design worlds. 

Designer 2’s work demonstrates how current IA artifacts are augmented by coding schemas to 
communicate how context influences the state of an information system. Designer 2’s use of color-coding 
added an additional dimension to the 2D space that the sitemap occupies. By providing an additional 
dimension to the sitemap, coding acts as a useful mechanism to communicate how context influences the 
system. While Designer 2’s work provides a vivid example of the use of coding, she was far from unique in 
using codes to express the context of the system (though she was the only designer to use color to this end); 
in each interview, we found that the designers added additional dimension to sitemaps and wireframes 
through the use of coding. This was most commonly done through the use of text annotation on wireframes 
and sitemaps, but we also saw designers use illustrations, color-coding, and moving the location of notes to 
communicate changes in the user’s location. To summarize the different ways that coding was used in the 
interviews, below is a table of the information artifacts from the interviews and the modes of coding they 
contained (see Table 2). These examples of coding begin to suggest the importance of how context is 
represented in design practice. In the following section, we continue to explore how designers viewed context 
by detailing the role of representation. 
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ID Wireframe Sitemap Combined wireframe/sitemap 
1 t a Text, illustrations 
2 None Color-coding Text, color-coding 
3 . ü Text, illustrations 
4 Illustrations Text * 

5 None G Text, illustrations 
6 i Text - 

T 7 j Text 

8 Text + Text, illustration 
9 Text, illustrations Text, location R 

10 Text, illustrations None 7 

11 ü i Text 


Table 2: Summary table of the artifacts used by designers and which methods they used to encode context 
in the artifacts. In instances where designers used multiple wireframes or sitemaps, the methods of encoding 
used in the artifacts are combined. 


4.3 Representing Context 


The contrast between the designers’ work has helped to illuminate the diverse perspectives on the nature 
of context. The contrast illustrates how the designers adapt artifacts to reflect varying perspectives and 
approaches to understanding context. While all artifacts are representations of elements of the design space, 
we found that designers endeavored to include representations of context in their work. To accomplish this, 
the designers created multiple representations of context and the system, which allowed them to overlay 
components of context on the system. One example of this practice came from Interview 8. The designers 
drew a simple grid representing a map on a glass table and a mockup of interface on a dry erase board. The 
designers would make changes to the location by pointing to the squares in the grid drawn on the table. 
They would then see if the changes in location could be accommodated by a wireframe of the interface 
(Figure 5). This created a way to represent the location and the system in tandem. By juxtaposing these 
representations, the designers were able to see how the user’s implicit interaction in the world and their 
explicit interaction with the system would influence the state of the system. 


Figure 5: A grid drawn on a glass table and a wireframe drawn on a white board. The drawings on the 
table were used in conjunction with the wireframe to show how changes in location (drawn on the glass 
table) would affect and be represented in the interface. 
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This method of representing context was unique to the designers of Interview 8. However, we did see 
additional methods to address how context influenced the system. The other methods we encountered in 
the interviews involved more complicated techniques, such as experience prototyping (Buchenau & Suri, 
2000) or developing functional prototypes. This suggests that methods to represent context and its influence 
on a system are still being developed, but are indeed needed. An interesting aspect of this technique is that 
by representing the context the designers necessarily reduce what could account for context to a single 
dimension, namely location. Thus, it seems that while this method was helpful, it does narrow the designers’ 


perspective on what constitutes “context.” 


5 Discussion 


Based on our analysis of the interviews, we found that when working with context-aware systems, designers 
adapt “standard” information structure artifacts to highlight, encode, and represent context. Highlighting 
features of context within familiar artifacts causes designers to distill the complicated notion of context 
down to the pertinent factors needed to express the concept in a manner the artifact affords. This process 
leads designers to create a vocabulary of codes to express the behavior of the system. By adapting existing 
artifacts, designers can consider the constraints and opportunities of context alongside other long-standing 
concerns such as information presentation. This was clearly seen with Designer 2’s color-coding of context. 
Designer 2’s color-coding provided an additional dimension to help her reflect on the role of context, but 
additional dimensions cannot be added indefinitely. In light of this, we believe designers should be careful 
to acknowledge the features of context their artifacts represent, as well as omit. Through careful reflection 
on how context is highlighted, encoded, and represented Information Architects will be better suited to 
address the unforeseen consequences that arise from context-aware systems. 

The contrast between which artifacts were chosen and how they were applied sheds light on the 
tension between phenomenological and positivist views of context. Designer 9’s use of known and 
approximate locations while highlight context demonstrates that both perspectives exist in contemporary 
design practice. Interestingly, we see that both perspectives are present within individual projects and are 
being applied by individual designers. While we are reluctant to make claims about consistent temporal 
patterns for activities and artifacts based on our study, our data suggests a provisional alignment of the 
phenomenological perspective with earlier design stages and of the positivist perspective with later design 
stages. This makes sense because as the system progresses, it becomes increasingly necessary that designers 
communicate the design constraints in ways that can be captured and processed by computer hardware and 
software. This finding suggests that design tools for context-aware computing may need to accommodate 
shifting perspective on context. 

One limitation of our study is that we relied on artifacts and the designers’ recollections of practices. 
As we discussed above, it is our view that the process of creating artifacts necessarily influences the way 
context is represented. Relying on the artifacts may have influenced designers to think about the process in 
terms of the representations of the process, which may create a bias toward a positivist interpretation of 
context. Practices such as experience prototyping (Buchenau & Suri, 2000) serve to represent the designer’s 
concept of context, but relying on their memory of the practice undoubtedly loses some of the richness that 
being there would reveal. Despite this limitation, we do feel that the designers were able to discuss their 
design practices with sufficient detail for our analysis. However, an ethnographic study of context-aware 
design practice would be a valuable way to explore this topic in future work. 

Our analysis revealed the importance of generating and communicating codes to the members of 
the design team and the role that various artifacts play in the design of context-aware systems. This finding 
draws our attention back to Schén’s discussion of design worlds. Schön argued that design worlds may be 
unique to a designer or shared across a broader community (Schön, 1992). This suggests that an analysis 
of the specific codes could help to establish a common vocabulary for context-aware design. Along similar 
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lines, Garrett has attempted to establish coding practices for general IA practice using a “visual vocabulary” 
(Garrett, 2002). Initiatives to facilitate designers’ communication around their context-aware design 
practices could serve to facilitate the emergence of standard codes or coding practices. As the field of IA 
moves forward, it is our belief that these coding practices will become essential to establishing widely 
adopted design patterns. 


6 Conclusion 


In this paper, we have sought to contribute to the understanding of context-aware design by analyzing the 
artifacts and practices of designers. Our findings suggest that designers employ differing views on the nature 
of context. Their views on context also change as they engage in the design of a system. This change seems 
to be generally from a more phenomenological perspective to a more positivist perspective and results from 
the process of highlighting and representing context that encode the relevant features. The produced codes 
are then used to communicate with other stakeholders in the design process and ultimately evaluate the 
design. The process is not a straightforward march, but relies on creating multiple representations of the 
context that are evaluated for different forms of context and by different stakeholders. Designers moved 
back and forth between understandings of context as they sought to satisfy design constraints. 

We conducted our analysis by relying on the theoretical lens of professional vision. This lens gave 
us insight into the way that designers viewed the domain of context-aware design. As the field of IA 
continues to mature, it will be informative to see how the practices and perspectives we outline in this paper 
transform. We present this work in hope that it will contribute to the grounding of future IA practice in 
empirically grounded work. We believe that through our analysis, we have demonstrated the value that 
using professional vision as a theoretical lens can contribute in exploring design practice. Furthermore we 
hope this work has helped to outline the value of attending to the role of the designer in the production of 
information systems. 
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Abstract 

The National Security Agency’s various surveillance programs recently revealed by Edward Snowden are 
collectively arguably the largest personal data collection and analysis operation in history. While the 
foremost exemplar of a fine-grained, global information system, they also represent among the most 
serious contemporary challenges to democratic governance and civil liberties. Based on media coverage 
and leaked secret documents, this paper analyses the main NSA data interception programs and their 
geographic characteristics. This research also draws on [Xmaps.ca, a crowd-sourced, interactive mapping 
application to show internet users where their personal traffic may be intercepted by the NSA. In 
particular, it demonstrates that internet surveillance facilities located in relatively few strategic locations 
enable a nearly comprehensive collection of domestic U.S. internet traffic. 
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1 Introduction 


The 2013 revelations of U.S. National Security Agency (NSA) surveillance programs brought to public 
attention by whistleblower Edward Snowden’s release of hitherto secret internal documents have sparked a 
storm of controversy. Their breathtaking scope, scale, and questionable legality, largely confirm the earlier 
allegations of clandestine domestic spying by retired AT&T technician Mark Klein(2009), author James 
Bamford (2008), and others (Landau, 2011). Klein, reported in 2006 that the NSA had secretly installed 
surveillance equipment in AT&T’s main San Francisco internet exchange point capable of copying and 
analyzing potentially all the internet traffic passing through that location. He indicated that similar facilities 
were operating in other AT&T switching centres. Termed NSA ‘warrantless wiretapping’, this was arguably 
the largest single state surveillance program conducted over the communications of its citizens (Bamford, 
2008). Congressional passage of special legislation in 2008 to protect telecommunications carriers against 
the dozens of lawsuits that ensued largely ended media attention to the controversy at that time, but left 
the constitutional and civil liberties issues unresolved, and considerable mystery remaining about what the 
NSA is actually doing. 

Thanks to Snowden, as well as the reporters he handed the trove of 58,000 NSA secret documents 
to, it is now much clearer that the warrantless wiretapping program was the tip of a much bigger iceberg, 
covering all forms of telecommunications traffic and implicating most of the major telecommunications 
carriers across the U.S. For the first time in decades, the widely publicized revelations have prompted a 
lively discussion in the U.S. as well as elsewhere about the appropriateness and legality of NSA surveillance. 
While the almost weekly breaking news stories are producing an increasingly detailed collage of hitherto 
secret NSA operations, there is still much more we need to learn and understand to have the informed 
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public debate now long overdue. Part of the challenge is to make sense of the welter of fragmentary reports, 
to yield a more coherent picture of the NSA’s vast surveillance infrastructure. This paper seeks to contribute 
to the public debate now getting underway by shedding light on the widespread state surveillance of 
everyday citizen internet communication. It focusses especially but not exclusively on internet surveillance 
within the U.S., and not so much on the vital legal, political and moral issues, but more on the technical, 
geographically specific aspects — what data is collected, whose personal data, where and how. 

During the Cold War, the NSA, as primarily a signal intelligence operation, concentrated on 
capturing over-the-air transmissions, such as via satellite or micro-wave relay, that could be intercepted 
passively by setting up antennae within broadcast range. Tapping into wireline communication was similar 
in that signals in analog transmissions typically ‘leaked’ and sensitive receivers placed along the lines could 
pick up the transmissions. The widespread shift during the 1990s to digital networks, notably using fibre 
optic cables, rendered the conventional interception modes obsolete and there were concerns within 
intelligence agencies about “going dark.”(Bamford, 2008) Because there is typically little or no electro- 
magnetic or optical signal leakage from digital transmissions, interception initially became much more 
challenging, not only technically, but also politically and organizationally. This is because it meant 
physically gaining access to the transmission equipment or breaking into the transmission path. However, 
just as the capabilities of digital communications, storage and analysis have soared, so too has the capacity 
to surveil digital activity on a massive scale. Besides the extraordinary technical prowess the U.S. is able 
to deploy in the service of its perceived surveillance and security needs, the U.S. also has a strategic 
geographic advantage in that a disproportionate share of international data communications — an advantage 
the NSA is well aware of. See Figure 1. 
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Figure 1: U.S. as World’s Telecommunications Backbone! 


These expansive changes in surveillance technique have been driven not only by advances in the underlying 
information technologies, but also by geopolitical changes in the nature of threats perceived by national 
security and law enforcement agencies and what are appropriate means to address them. These changes 
also have important implications for how (potentially) surveilled subjects can respond to them and more 
generally how democracies can govern the powerful forces unleashed by them. In particular, a geographic 
perspective on NSA surveillance, and its interception capabilities in particular, is helpful in understanding 
where it can be done, what parties are implicated, what legal jurisdictions apply and how relations among 
the various diverse actors distributed across the nation and around the world are affected. 


1 Source: Washington Post, NSA slides explain the PRISM data-collection program 
June 6, 2013, Updated July 10, 2013. http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/ 
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This paper, drawing on surveillence studies perspectives (Lyon, 2007), views NSA interception not 
as an isolated occurance, but as reflecting a wider societal phenomenon, in which surveillance, “monitoring 
people in order to regulate or govern their behaviour” (Gilliom & Monahan, 2013, p.2) as a central 
organizing principle. Surveillance is often benign, even essential, but is becoming so pervasive and 
inextricably connected to everyday activities that we can characterise our contemporary ‘western’ life as a 
surveillance society. At the same time, it is important to recognize that notwithstanding its burgeoning 
extent and intensity, surveillance and its effects are not uniform, affecting everybody, everywhere in the 
similar ways. In these respects, the NSA surveillance programs offer an albeit extreme, potentially malignant 
but nevertheless revealing exemplar of systemic surveillance trends in our increasingly digitally mediated 
contemporary society. 

We begin by describing briefly the NSA’s most prominent programs, with a focus on those that 
collect the raw data on which subsequent analysis and action is based. This provides the opportunity to 
highlight the specific kinds of personal data that are collected and from whom, where. The evidence for 
these programs comes almost exclusively from secret NSA documents released by Snowden, as reported 
from June 5 to the end of December 2013, mainly in the Guardian and Washington Post newspapers. The 
authenticity of these leaked documents, with very few exceptions, is generally acknowledged, including by 
the US government. The following section examines in more detail the NSA internet surveillance conducted 
in the US at major internet exchange points, referred to as the ‘warrantless wiretapping’ program. We again 
rely mainly on mainstream news reports, but in the period 2005-2012. The third section builds on what was 
reported at that time about the location and nature of the NSA US domestic internet surveillance facilities 
to explore empirically whether individually generated internet routes may be exposed to NSA warrantless 
wiretapping. This is done using a research-based internet mapping tool known as [Xmaps, developed to 
map internet exchange points and the traffic routed through them. The software tool found at [Xmaps.ca? 
aggregates crowd-sourced internet users’ ‘traceroutes’ and shows them where their personal traffic is likely 
to have been intercepted by the NSA. In contradistinction to the common metaphor of the internet as a 
space-less, featureless ‘cloud’, we demonstrate that with interception points in relatively few major cities 
(<20) the NSA is capable of intercepting a large proportion (>95%) of domestic internet traffic. We close 
by reflecting on the role that a geographic visualization tool may play in facilitating public understanding 
of internet surveillance and by calling for discussion within the information studies field about its ‘darker 
side’. 


2 NSA Surveillance Programs 


The reporting of NSA surveillance activities, beginning in June 2013 when whistleblower Edward Snowden 
handed over secret documents to reporters Glenn Greenwald and Laura Poitras, for the first time brought 
to wide public attention details of the NSA’s comprehensive array of data collection, archiving, mining, 
analysis and visualization programs. While much of the legal and political controversy these revelations 
provoked has focused so far, at least in the U.S., on whether access to collected data about Americans is 
legal or even constitutional, of wider significance is the existence of a global apparatus designed for and 
capable of intercepting virtually all electronic communications, ostensibly for ‘security’ purposes. These 
data accumulation activities in aggregate are enormous. To give an idea of their scope, William Binney, a 
former NSA mathematician and Technical Director of the World Geopolitical and Military Analysis 
Reporting Group, estimated in 2012 that the agency had "assembled on the order of 20tn transactions about 
US citizens with other US citizens", and this included "only ... phone calls and emails". In 2010, well before 
the Snowden revelations, according to this same Guardian article, the Washington Post reported that 


? http: //ixmaps.ca 
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"every day, collection systems at the [NSA] intercept and store 1.7bn emails, phone calls and other 


type of communications."? 


More recently, this paper reported that the NSA “is gathering nearly 5 billion records a day on the 
whereabouts of cellphones around the world.”* The breathtaking, global geographic scope of the NSA’s data 
gathering capabilities is demonstrated in its map of the ‘Worldwide SIGINT /Defense Cryptologic Platform 
(See Figure 2). 
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Figure 2: NSA’s worldwide data gathering surveillance infrastructure * 


Whose data the NSA can capture, where and how has now become a topic of widespread concern. While 
questions of what is actually done with the personal data the NSA collects, and what protections may or 
may not exist, at least for “US persons”, is of course at least as important, we focus here on the fundamental 
issue of original capture technique and locale. Of the numerous and varied NSA programs we so far now 
know something about, we’ll explore here five that reveal most directly the geographic scope and personal 
informational details of mass NSA surveillance. We begin with two analysis and visualization programs, 
Boundless Informant and X-Keyscore, that give a broad overview of the scale and intensity of surveillance, 
and then examine more closely three programs directly related to interception, each of which adopt a 
distinctive technique: bulk telephony meta-data collection, Prism, and Upstream. 


2.1 Boundless Informant 

The aptly named, top secret, Boundless Informant program is a data mining tool described in an official 
Global Access Operations (GAO) FAQ, as providing “the ability to dynamically describe GAO’s collection 
capabilities (through metadata record counts) with no human intervention and graphically display the 
information in map view, bar chart or simple table”. A GAO slide presentation claims it uses ‘Big Data 


3 Glenn Greenwald, XKeyscore: NSA tool collects ‘nearly everything a user does on the internet', Guardian, 31 July 2013. 
http: //www.theguardian.com/world/2013/jul/31/nsa-top-secret-program-online-data 

1 Barton Gellman and Ashkan Soltani, NSA tracking cellphone locations worldwide, Snowden documents show, Washington Post, 4 
December, 2013. http://www.washingtonpost.com/world/national-security /nsa-tracking-cellphone-locations-worldwide-snowden- 
documents-show/2013/12/04/5492873a-5cf2-11e3-bc56-c6ca94801fac_story.html?hpid=z1 

5 Floor Boon, Steven Derix and Huib Modderkolk, NSA infected 50,000 computer networks with malicious software, 23 november 
2013, NRC.NL, http://www.nre.nl/nieuws/2013/11/23/nsa-infected-50000-computer-networks-with-malicious-software/ Note that 
this image does not show the domestic interception within the Five Eye countries, which we’ll discuss later in relation to the U.S. 

ê Guardian, Boundless Informant: NSA explainer — full document text, June 8, 2013 

http: //www.theguardian.com/world/interactive/2013/jun/08/boundless-informant-nsa-full-text 
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technology to query SIGINT [signal intelligence] collection in the cloud to produce near real-time intelligence 
describing the agency’s available SIGINT infrastructure and coverage.”” While this tool doesn’t access all 
the information the NSA collects (e.g. information covered by FISA restrictions is not included), the volume 
is impressive. The Guardian reports that “in March 2013 the agency collected 97bn pieces of intelligence 
from computer networks worldwide”, including nearly 3bn in the U.S. See Figure 3 for a partial ‘heat map’ 
showing relative amounts of data collected in various countries that month. While a few countries show 
particularly intense collection (marked in red), this map shows that interception is widespread around the 
globe. 


United States 
2,892,343 446 


Figure 3: Heat map of NSA meta data collection in March 2013° 


2.2 X-KeyScore 


X-KeyScore is a query tool designed to allow authorized analysts to interrogate through a desktop interface 
the NSA’s vast world-wide intelligence holdings. ‘Selectors’, such as an email address or IP address, can be 
used to access stored data as well as initiate “ongoing ‘real-time’ interception of an individual's internet 
activity.” Due the large volumes collected, data is initially held close to the point of capture and much of 
it is deleted after a few days.” Figure 4 shows the world-wide distribution of these caches, again showing 
the broad geographic scope of NSA surveillance. It is not just NSA analysts who have access to this data, 
but as the top of the slide shows, the security agencies in the other “Five Eyes” countries — Australia, 
Canada, Great Britain and New Zealand. There are also reports of access by intelligence services in other 
US allies such as Germany" and Isreal." 


í Guardian, Boundless Informant NSA data-mining tool = four key slides, June 8, 2013. 
http: //www.theguardian.com/world/interactive/2013/jun/08/nsa-boundless-informant-data-mining-slides 

8 Source: Glenn Greenwald and Ewen MacAskill, Boundless Informant: the NSA's secret tool to track global surveillance data, 
Guardian, 11 June 2013. http://www.theguardian.com/world/2013/jun/08/nsa-boundless-informant-global-datamining 

? Sean Gallagher, Building a panopticon: The evolution of the NSA’s XKeyscore, ArsTechnica, Aug 9 2013. The need for these globally 
distributed caches may well be temporary, given the massive Utah data centre that Bamford reported on in 2012, originally scheduled 
to open September 2013. See: Bamford (2012). 

Der Spiegel. "'Prolific Partner': German Intelligence Used NSA Spy Program", July 20, 2013. 

11 Glenn Greenwald, Laura Poitras and Ewen MacAskill, NSA shares raw intelligence including Americans' data with Israel. Guardian, 
11 September 2013. http://www.theguardian.com/world/2013/sep/11/nsa-americans-personal-data-israel-documents 
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Figure 4: Location of data caches accessible by X-Keyscore” 


We turn now to looking at the major surveillance programs revealed so far that generate these vast troves 
of intelligence data. 


2.3 Bulk telephony meta-data collection 


The NSA has long been suspected of collecting the telephone calling records of Americans, and there have 
been court cases challenging this practice dating back to 2008. The main charges were against AT&T as 
well as Verizon/MCI, BellSouth, Sprint, and Cingular. However, the FISA Amendments Act (FISAA) of 
2008, popularly known as the “Telecom Immunity Act”, rendered these cases moot, as this legislation 
“allow[s] federal judges to waive lawsuits if the telecom firms can prove that they were authorized by the 
president and assured that the program was legal.” There were several more court cases against the federal 
government, with the federal government seeking so far successfully to have each dismissed on ’national 
security’ grounds, or because plaintiffs couldn’t establish ‘standing’, i.e. they weren’t able to prove that 
their telephone activities had been intercepted because the existence (or not) of such surveillance was itself 
a secret.’ However, the release, first by the Guardian and subsequently by the federal government, of the 
FISA Court order requiring Verizon to “produce to the [NSA] ... on a daily basis ... an electronic copy of ... 
all call detail records or ‘telephony metadata’”,'© confirmed the existence of the secret program and re- 
ignited the court challenges.! Not just international calls are included, but all local calls as well. In other 


12 Glenn Greenwald, XKeyscore: NSA tool collects 'nearly everything a user does on the internet', Guardian, 31 July 2013. 
http: //www.theguardian.com/world/2013/jul/31/nsa-top-secret-program-online-data 

13 Unlike the five telecom carriers that faced lawsuits, Qwest reportedly did not comply with the NSA's request to turn in customers' 
telephone records. Moreover, Qwest CEO claims that this request was made in February 2001, well before the attacks of September 
11. In addition to requests for phone records, Qwest was also approached by unnamed “clandestine agencies” about allowing the latter 
the use of Qwest's “fiber-optic communications network for government purposes.” Qwest says that it did not comply. 

14 M. Soraghan, “House passes FISA overhaul” The Hill,June 20, 2008; and 

http: //www.sourcewatch.org/index.php?title=FISA_ Amendments _Act_of 2008 

1 The two most prominent of these cases are Jewel v. NSA and Clapper v. Amnesty. In Jewel v. NSA, EFF is suing the NSA and 
other government agencies on behalf of AT&T customers to stop the illegal, unconstitutional and ongoing dragnet surveillance of their 
communications and communications records. In Clapper v. Amnesty et al the Supreme Court in 2013 denied the ACLU’s challenge 
to the constitutionality of FISAA based on ‘lack of standing.’ 

16 Guardian, Verizon forced to hand over telephone data — full court ruling, June 6, 2013. 
http://www.theguardian.com/world/interactive/2013/jun/06/verizon-telephone-data-court-order 

'7 The ACLU, a subscriber of Verizon and hence more confident that the issue of standing would not be problem it was in Clapper v. 
Amnesty, returned to court on June 11, 2013, “challenging the constitutionality of the National Security Agency’s mass collection of 
Americans’ phone records.” https: //www.aclu.org/national-security /aclu-v-clapper-challenge-nsa-mass-phone-call-tracking. On 
December 2013, Judge William Pauley, of the Federal Southern District of New York, rejected their claim in finding this bulk collection 
did not violate the U.S. Constitution. See: Dan Roberts NSA mass collection of phone data is legal, federal judge rules. Guardian, 27 
December 2013. http://www.theguardian.com/world/2013/dec/27/judge-rules-nsa-phone-data-collection-legal. However, a week 
earlier, Judge Richard J. Leon of the Federal District Court for the District of Columbia, had ruled in a similar case that “that the 
National Security Agency program that is systematically keeping records of all Americans’ phone calls most likely violates the 
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words, details of every call originated or terminated in the U.S. are captured. The granularity of collection 
is also impressive. According to the secret FISA court order: 


“Telephony metadata includes comprehensive communications routing information, including, but 
not limited to session identifying information (e.g. originating and terminating telephone number, 
International Mobile Subscriber Identity (IMSI) number, International Mobile station Equipment 
Identity (IMEI) number, etc.), trunk identifier, telephone calling card numbers and duration of 
call.” (Vinsen, 2013, p. 2) 


Location data can also be included. While metadata does not enjoy the same degree of legal protection as 
message content, many have argued it can often be just as sensitive and intrusive. !* Furthermore, persons 
affected by the court order are ‘gagged’, i.e. prohibited from telling any unauthorized person about the 
order. While so far the only order made public is that of Verizon, there is good reason to believe that other 
similarly large telecom providers, such as AT&T, Sprint and Cingular, have also been served with equivalent 
orders. As such, we can conclude that without prior suspicion nearly everyone in the U.S. has all their call 
details routinely reported every day to the NSA. 


2.4 Prism 


Prism, the most recent of the NSA’s large scale domestic data collection program, involves “tapping directly 
into the central servers of nine leading U.S. Internet companies.” Starting in 2007, the NSA has arranged 
for automated access to the servers of Microsoft, Yahoo, Google, Facebook, PalTalk, AOL, Skype, YouTube 
and Apple. The Washington Post also reports that the 


“NSA collects, identifies, sorts and stores at least 11 different types of electronic communications 
[including] Chats, E-mail, File transfers, Internet telephone [VoIP], Login/ID, Metadata, Photos, 
Social networking, Stored data, Video, Video conferencing” ” 


Prism appears to have arisen in response both to legal and political pressures when the ‘warrantless 
wiretapping’ program came to light, as well as to get around the increasing use of encryption that rendered 
analysis of message content captured by Upstream more difficult. Because the on-line services of these nine 
companies are popular globally, and are covered by US law, their users world wide can expect their personal 
data to be open to inspection by the NSA, with no expectation of legal protection, if outside the U.S. 


2.5 Upstream 


When the recent round of NSA surveillance revelations broke in June 2013, it was the bulk telephony meta- 
data collection and the Prism program, discussed above, that garnered the greatest media attention. 
However, because of its potentially even wider reach than either of the other two, it is the NSA’s Upstream 
program, revealed later and incidentally, that is arguably the most significant and politically challenging of 
the Agency’s massive data collection programs. There are few Snowden documents yet (as of December 


2013) giving specific details, but it is mentioned in an early Guardian article focused on the Prism program. 


Constitution”. Charlie Savage, Judge Questions Legality of N.S.A. Phone Records, New York Times, 16 December 2013. 
http://www.nytimes.com/2013/12/17/us/politics/federal-judge-rules-against-nsa-phone-data- 
program.html?nl=us&emc=edit_cn_20131216&_r=1& Given this disparity in findings and the significance of the issues at stake, the 
constitutionality of the NSA’s bulk telephony metadata collection program will likely go before the Supreme Court. 

18 Jane Mayer, What’s the Matter with Metadata?, New Yorker, June 6, 2013. 

http: //www.newyorker.com/online/blogs/newsdesk/2013/06/verizon-nsa-metadata-surveillance-problem.html 

1 Barton Gellman and Laura Poitras, U.S., British intelligence mining data from nine U.S. Internet companies in broad secret 
progrram, Washington Post, June 6, 2013 http://www.washingtonpost.com/investigations/us-intelligence-mining-data-from-nine-us- 
internet-companies-in-broad-secret-program/2013/06/06/3a0c0da8-cebf-11¢2-8845-d970ccb04497_ story.html 

2 Barton Gellman and Todd Lindeman, Inner workings of a top-secret spy program, Washington Post, June 29, 2013 
http://apps.washingtonpost.com/g/page/national/inner-workings-of-a-top-secret-spy-program/282/ 
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A top secret training slide (see Figure 5), with a world map showing submarine traffic patterns as 
background, summarizes Upstream as “Collection of communications on fiber cables and infrastructure as 
data flows past.”?! As this quote suggests, there appear to be two main techniques for accessing data 
networks — installing fiber optic ‘splitters’ within major internet switches (infrastructure), and where the 
switch operators are not sufficiently cooperative, taking the technically more challenging route of tapping 
into the cables at some point along the route between the switches. Since much of the international internet 
traffic travels by submarine fiber optic cable, this means installing taps at landing stations or even mid 
ocean.” In both forms of interception, deep packet inspection (DPI) is used to examine and store all aspects 
of the traffic, including meta-data (e.g. to- and from- IP addresses in the packet headers) as well as 
communicative content (i.e. the packet ‘payload’). Since messages, such as for email, are broken into a 
series of packets for transmission, these need to be reassembled before the actual message content can be 
analysed. 
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Figure 5: NSA training slide for Prism program” 


We turn now to a more detailed discussion of Upstream and the NSA’s domestic U.S. ‘warrantless 
wiretapping’ program and where it is most likely to have installed its splitter operations. 


3  Warrantless Wiretapping Program in the U.S. 


The New York Times first reported the interception of US domestic communications by the NSA in late 
2005.” But it wasn’t until Mark Klein, a recently retired AT&T technician, revealed the existence of a 
secret ‘splitter’ operation at 611 Folsom St in San Francisco that the scope and technical details of NSA 
surveillance came to public light. Klein reported that AT&T had spliced fiber-optic splitters into 16 ‘peering 
links’ that connected its network with other major carriers and internet exchange points, directing an exact 
copy of all the traffic passing through these links into a ‘secret room’ on the 6th floor, Room 641A. Here a 


2! The British communications intelligence agency Government Communications Headquarters (GCHQ) conducts a similar 
fibreoptic cable interception program by the name of Project Tempora. See: Ewen MacAskill, Julian Borger, Nick Hopkins, Nick 
Davies and James Ball, “Mastering the internet: how GCHQ set out to spy on the world wide web” Guardian, 21 June 2013 

http: //www.theguardian.com/uk/2013/jun/21/gchq-mastering-the-internet 

? The nuclear submarine, Jimmy Carter, has been specially modified to conduct these under water cable tapping operations. See: 
Associated Press, New Nuclear Sub Is Said to Have Special Eavesdropping Ability, New York Times, February 20, 2005. 
http://www.nytimes.com/2005/02/20/politics/20submarine.html? _r=0 

233 James Ball, NSA's Prism surveillance program: how it works and what it can do, Guardian, 8 June 2013 

http: //www.theguardian.com/world/2013/jun/08/nsa-prism-server-collection-facebook-google 

4 J. Risen and E. Lichtblau, Bush Lets U.S. Spy on Callers Without Courts, New York Times, December 16, 2005. 
http://www.nytimes.com/2005/12/16/politics/16program.html?ex=1145419200&en=87817a067833b164&ei=5070 
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Narus STA 6400 analyzed all the packets passing by, providing “complete visibility for all Internet 
applications” according to its vendor. In other words, this operation enables the NSA to monitor not only 
who is communicating with whom, but potentially the entire contents of these communications as well. 

Klein’s revelations provoked strong reaction by civil liberties organizations, resulting in over four 
dozen court cases against U.S. telecom carriers and the federal government. These cases allege that the 
carriers illegally complied with multiple surveillance requests from the NSA during the Bush Administration 
to provide without warrants specific information about US citizens.” 

The secrecy that pervades this topic makes it difficult to determine whether the NSA surveillance 
program is continuing or not, but the recent reports strongly suggest that not only is it on-going, but is 
expanding during the Obama Administration. James Bamford’s article in the March 2012 issue of Wired 
details the construction of an enormous data centre in Bluffdale Utah capable of storing and analyzing the 
complete record of interpersonal internet traffic (Bamford, 2012). In July 2012, three whistleblowers, 
William E. Binney, Thomas A. Drake, and J. Kirk Wiebe, all former NSA employees, gave evidence in the 
Electronic Frontier Foundation's (EFF's) (2012) lawsuit against the government's mass surveillance 
program, Jewel v. NSA in support of the surveillance allegations. In particular, Binney, a former NSA 
technical director, claims the then current program, known as Stellar Wind, is capable of intercepting 
virtually all email in the US and much else.” The more recent revelations by whistleblower Snowden further 
confirm the earlier claims, demonstrate that they are part of a much wider suite of surveillance programs 
and better establish state surveillance as a vital topic for (inter)national debate. 

Given that the NSA’s internet surveillance is on-going but its details still a closely guarded secret, 
how can we determine where it is being conducted, and whose traffic is capable of being intercepted? These 
are the central questions we now examine. We will focus our investigation on AT&T, and the splitter 
installation at 611 Folsom Street, as this is the best documented case and provides a model for the 
interception of internet traffic at other major internet exchange points in the U.S. and presumably by other 


major carriers. 


4 Mapping NSA surveillance sites and internet traffic through them 


4.1 Where are the NSA splitter sites? 


While we know of the NSA splitter site at 611 Folsom Street, what about additional suspected sites? Based 
on his conversations and meetings with other AT&T technical staff, Klein (2009) reported that similar 
installations were installed in five other locations — Seattle, San Jose, Los Angeles, San Diego and Atlanta. 
However, these 6 sites would not be sufficient to comprehensively intercept US internet traffic, as there are 
other, more important routing centres that would be much more attractive for interception purposes. Scott 
Marcus, a former Federal Communications Commission expert, estimates that AT&T had 15-20 splitter 
sites.” However, he wasn’t able to identify any sites in particular without further specific evidence. 
Presuming that the NSA’s goal was to be able to intercept the largest proportion of US internet traffic with 
the fewest possible sites (a hypothesis well confirmed by the subsequent Snowden revelations), we developed 
a crude schema for scoring cities based on how much internet traffic was likely to pass through them. Using 
only our personal estimates of 3 determinants of internet prominence, with crude relative weightings: 
telecom infrastructure (10); city size (population) (5); and geographic location in relation to other major 


25 While the Bush Administration initially denied the role of telecommunications carriers, it subsequently confirmed this in general 
terms. Eric Lichtblau, “Role of Telecom Firms in Wiretaps Is Confirmed”, New York Times, August 24, 2007. 
http: //www.nytimes.com/2007/08/24/washington/24nsa.html?ex=1345608000&en=4e8428cf3d46306c&ci=5090&partner=rssuserlan 
d&emc=rss 

% P., Harris, US data whistleblower: ‘It's a violation of everybody's constitutional rights', Guardian, Sept. 15, 2013, 
http: //www.guardian.co.uk/technology/2012/sep/15/data-whistleblower-constitutional-rights 

2? PBS Frontline. Spying on the Home Front, May 15, 2007. http://www.pbs.org/wgbh/pages/frontline/homefront/view/ 


420 


iConference 2014 Andrew Clement 


population centres and telecommunications traffic patterns (4), we developed an ordered ranking of the US 
cities most likely to host an NSA splitter installation. To test our hypothesis, and more generally provide a 
means for internet users to see where their data traveled and possibly subject to surveillance, we developed 
the [IXmaps software system. Using a crowd-sourced approach, we invite geographically scattered user to 
install a customized version of the common traceroute” program that populates our database. We add 
location data for the routers encountered using a variety of standard geo-location techniques and from this 
users can then selectively map their own or others’ traceroutes via a Google Maps mashup. Currently the 
database contains over 26,000 traceroutes contributed by more than 200 submitters from over 180 
originating addresses in North America to in excess of 2600 destination URLs. We examined all the US- 
only route in the [Xmaps database, which currently numbers 2927. Of these, 2839 passed through at least 
one of the 18 cities we identified as the most likely sites for NSA splitter operations. In other words, 
installing splitters in the major internet exchange points in just these cities would be sufficient for the NSA 
to intercept 97% of our US only traceroutes! These are shown in Figure 6. 


Figure 6: 18 US cities most likely to host NSA splitters”? 


While this result of course does not prove that these cities actually have NSA splitter operations, nor that 
the NSA has access to all the internet exchange points in them, it is powerful confirmation that if the NSA 
install splitters in relatively few strategic internet choke points it would be it is technically feasible for it 
intercept a very large proportion of U.S. internet traffic. This high percentage helps justify our claim that 
these cities are strongly suspected of hosting NSA warrantless surveillance facilities. It also vividly challenges 
the popular image of the internet as a ‘cloud.’ 


4.2 Does my personal internet data pass through NSA splitter sites? 


With the suspected NSA cities identified, we are in the position to give individual internet users a reasoned 
indication of whether their particular communications are likely to be subject to warrantless interception. 
Exploiting the feature of [IXmaps to target any user-provided URL, individuals can generate traceroutes 
customized to their own internet activities. IXmaps renders both the tabular and map views of these 


°8 See: http://en.wikipedia.org/wiki/Traceroute 

2 Biases in the sample of traceroutes contributed by users to the database mean that this particular list of cities and the relative 
amount of domestic U.S. traffic that could by intercepted by NSA splitters installed in them needs to be treated with caution. The 
chronic difficulties, widely recognized in the internet routing research community, in accurately geo-locating routers based on 
hostnames, IP addresses and latencies, further complicate the picture. Nevertheless, we believe the overall conclusions about a relatively 
small number cities being sufficient to capture a very large proporation of US traffic remains valid. For more on these issues and the 
[Xmaps project generally, see Clement, A. “IXmaps — Tracking your personal data through the NSA’s warrantless wiretapping sites” 
IEEE - ISTAS conference, Toronto, June 26-27, 2013. https://www.dropbox.com/s/9y4xtavova2qtj4/ISTAS13 paper 26 [Xmaps 
%E2%80%93 Tracking May 22.pdf 
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traceroutes with distinctive icons to highlight those hops most susceptible to NSA splitting. Table 1 and 
Figure 7 show a traceroute (TR 1859), from a home in Toronto to the San Francisco Art Institute, with 
hops in the AT&T facilities in both San Francisco and Chicago flagged as likely sites of NSA interception 
along the way. 


Traceroute detail 

Traceroute id: 1859 

origin: MS5S2M8 destination: sfai.edu [63.197.251.33] 

submitted by: AndrewC submitted on: 2009-12-13 12:06:51-05 

Hop IP Address Min. Latency Carrier Location GeoPrecision Hostname 

O 206.248.1540 bi 0 TekSavvy Toronto ON city level 206.248.154.0 

1 69.196.136.66 i 0 TekSavvy Toronto ON city level 2120.ae0.bdr02.tor.packetflow.ca 

2 64.34.236.121 i 0 Peer 1 Toronto ON city level 64.34.236.121 

3 216.187.114.145 bie 0 Peer 1 Toronto ON building level 10ge.xe-2-0-0.tor-151f-cor-1.peer1 .net 
4 = 216.187.114.133 bi 0 Peer 1 Toronto ON building level 10ge.xe-0-0-0.tor-1lyg-cor-1.peer1 .net 
5 216.187.114.141 = 15 Peer 1 Chicago IL building level oc48-po5-0.chi-eqx-dis-1.peer1.net 

6 206.223.119.79 = 15 ESNET - ESnet Chicago IL building level ex1-g1-0.eqchil.sbcglobal.net 

7 = 151.164.99.110 = 15 AT&T Internet Services Chicago IL city level 151.164.99.110 

8  151.164.99.129 = 15 AT&T Internet Services Chicago IL city level 151.164.99.129 

9 12.122.79.85 = 15 AT&T WorldNet Services Chicago IL city level gar3.cgcil.ip.att.net 

10 12.122.133.218 = AT&T WorldNet Services Chicago IL city level crl.cgcil.ip.att.net 


11 12.122.44.121 = 
12 12.123.15.110 = 
13 12.122.110.113 œ 
14 1291.92.250 = 
15 63.197.251.33 = 


62 AT&T WorldNet Services San Francisco CA building level cr1.sffca.ip.att.net 
62 AT&T WorldNet Services San Francisco CA building level cr83.sffca.ip.att.net 
62 AT&T WorldNet Services San Francisco CA building level gar26.sffca.ip.att.net 
62 AT&T WorldNet Services San Francisco CA building level 12.91.92.250 

62 AT&T Internet Services San Francisco CA Maxmind 63.197.251.33 


eaaa aaaaaaadadl 
foal 
N 


Legend 
w NSA: Known NSA listening facility in the city 
© NSA: Suspected NSA listening facility in the city 


Table 1: Traceroute details for TR #1859, Toronto to San Francisco Art Institute 


Figure 7: [IXmaps rendering of traceroute #1859, Toronto to San Francisco Art Institute 


4.3. Non-US traffic may also be exposed to NSA splitters 


So far we have concentrated on traffic that explicitly travels via US routing centres, i.e. originating or 
terminating in the US, or both. It is well known, at least in internet circles, that traffic that neither 
originates nor terminates in the US may nevertheless transit via the US, mainly due to the interconnection 
arrangements of the major international carriers (Norton, 2012, p. 71). However, the extent of this practice 
and its surveillance implications are less well known. While this affects many countries, Canadian traffic in 
particular, largely due to its proximity to the US as well as the structure of the North American internet 
service industry, is especially prone to routing via the US. We refer to traffic that originates and terminates 
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in the same country, but transits another, as “boomerang traffic.” Analysis of [Xmaps data reveals that 
approximately one third of the Canadian routes follow a boomerang pattern. That long distance Canadian 
routes may be routed via the US is not surprising, but we were struck by the number of routes that start 
and end in the same Canadian city, but are routed via the US. We have found over 100 such boomerang 
routes based in Toronto alone. Figure 8 shows one example that transits New York and Chicago, both cities 
strongly suspected of hosting NSA splitters. Whether crossing the continent, or returning to the same city, 
Canadian boomerang traffic is almost entirely exposed to NSA surveillance. 


Figure 8: A Canadian boomerang route based in Toronto (TR6896) 


5 Discussion 


When a government conducts surveillance on its citizens and acts outside conventional legal bounds, as the 
US government has arguably done in the case of the NSA’s surveillance programs, the norms of liberal 
democratic governance are seriously violated and demand public accountability. However, the usual 
mechanisms for such accountability have not so far succeeded. Congress passed legislation in 2008 that 
retroactively granted the implicated telecom carriers immunity from prosecution and the executive branch 
has until recently largely stymied court challenges by invoking a blanket “state secrets” exemption. While 
over the past decade there have been several notable journalistic exposés (Bamford, 2008; PBS Frontline, 
2007) and brave whistleblowers from both the NSA and AT&T have brought damning information to light, 
it is only months after the Snowden revelations that the public policy debate is getting underway in earnest 
and may still falter. 

This paper has attempted to contribute to this debate by exploring the geographic dimensions of 
the NSA surveillance programs. To counteract the popular but misleading metaphor of the internet as a 
‘cloud’, we have reviewed the three main NSA interception programs, bulk telephony metadata collection, 
Prism and Upstream, in each case highlighting the critical role that spatial and locational features play in 
what data is collected, on whom and how. In aggregate, these interceptions programs are capable of and 
likely are capturing almost all electronic communication, at least within the U.S. 

Compounding the usual difficulties in holding powerful players responsible for their actions is the 
intrinsically invisible character of internet surveillance. Beyond the notorious secrecy of the NSA, the 
surveillance is conducted out of sight and leaves no discernible trace. For the great majority of the 
population, the workings of the internet, especially in its core, are dauntingly complex and inscrutable. We 
have developed the [Xmaps internet mapping application to overcome these obstacles by promoting greater 
transparency and visibility of the NSA surveillance activities. Within the limits of the available information, 
we have been able to reveal the likely sites of NSA surveillance operations and show interested individuals 
where the NSA may intercept their own data packets. By using interactive maps and graphic images, we 
hope to make the surveillance more vivid, discussable and a matter of public concern. 

But more than just serving concerned citizens and curious explorers, [Xmaps encourages and relies 
on its users to contribute to building its database of traceroutes. This crowdsourcing is necessary to ensure 
a good geographic distribution of originating points, so that the internet core is well surveyed, but also 
provides the means for people to view the internet from their own personal perspectives. Perhaps more 
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importantly, integrating these contributions in an open and publicly visible manner constitutes a form of 
collective counter-surveillance with the potential to empower participants in holding the US government 
and its national security agency to account for its warrantless surveillance. 

We also hope this constructive, surveillance studies approach to the problem of unaccountable state 
surveillance, can stimulate a critical and productive discussion within the information studies field about 
its ‘darker side’ and how we as implicated participants can act responsibly, individually and collectively, in 
the face of a most serious societal challenge. In particular, widespread faith in the power of harnessing ever 
more information capturing and processing capabilites to solve societal problems, as exemplified in the NSA 
programs and current enthusiasm more generally for ‘big data’ techniques, needs to be tempered by a careful 
examination of its implications in light of other values, such as privacy and democratic governance. 
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Abstract 

Awareness information, information about others’ presence and activities that allows us to determine 
their availability for conversation, plays an important role in workplace communication, as people often 
gather and act on it in the process of negotiating mutual availability. This paper presents a laboratory 
experiment examining how gathering awareness information is affected by the cultural backgrounds and 
mutual familiarity of collaborators. Results suggest that members of cultures considered more 
relationship-oriented (e.g., China) gathered awareness information less frequently than members of 
cultures that are more task-oriented (e.g., the United States). We argue that this is because of the 
different motivations for interaction prioritized by these cultures. We did not find any effect for 
familiarity, but provide several alternative explanations for this result. 
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1 Introduction 


Awareness information about others’ presence and activities (Gross, Stary, & Totter, 2005) can help people 
determine the availability of their colleagues before initiating interactions. People can use this information 
to time their interruptions so as to minimize disruption of others (D. Tang & Birnholtz, 2010). Common 
examples of gathering awareness information include checking whether someone is online or offline using 
instant messaging status (Gross et al., 2005; D. Tang & Birnholtz, 2010), or simply poking one’s head into 
someone’s office to see if they are there. However, these behaviors can be socially delicate in that, when 
timed or executed poorly, they can become interruptions themselves, and may therefore cause unnecessary 
stress or relational strain (Carton & Aiello, 2009; Makin, Rout, & Cooper, 1988). A substantial body of 
research has explored the design of systems that provide awareness information without causing negative 
social consequences (for a review, see J. Tang, 2007). 

However, there is reason to believe that many of the findings from that research merit 
reconsideration in today’s increasingly globalized working environment. With barriers to communication 
being eased by technologies, and communication across national borders occurring every day (Diamant, 
Lim, Echenique, Leshed, & Fussell, 2009; Krishna, Sahay, & Walsham, 2004), research has found that 
people from different countries may use the same technology quite differently, partly due to different social 
norms and cultural expectations (Ur & Wang, 2013). 

One area in which these differences have significant potential consequences is the use of awareness 
information in timing interruptions. There is some evidence to suggest that people from Chinese and 
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American cultures, which have been shown to differ in the extent to which people focus on getting a job 
done vs. nurturing their relationships with collaborators, differ in their concerns about interruption and 
initiating interaction (Schuster & Copeland, 2006; Triandis, 1995; Triandis, Bontempo, & Villareal, 1988). 
We argue that members of these cultures may therefore differ in the extent and frequency of their use of 
awareness information. 

Additionally, people develop different levels of familiarity with their collaborators: some may be 
more familiar because they have been working or spending some time together; others may be complete 
strangers with no previous interaction history. Research suggests people interact with strangers and non- 
strangers differently, in that they are more informal and less polite when they are interacting with a friend 
than with a stranger (Brown & Levinson, 1987; Wolfson, 1988). These behavioral differences may also be 
reflected in how people gather awareness information about their collaborators. Since gathering awareness 
information is socially delicate and can be perceived as invasive and impolite, we argue that people will 
gather awareness information more frequently from people they know rather than from strangers. 

These two factors, cultural background and familiarity, are not only important individually, but 
also act together. Research suggests Americans and Chinese may differ in the ways they treat strangers and 
friends/acquaintances, in that Chinese may become less relationship-oriented when they are interacting 
with strangers than with friends, whereas no study has documented similar tendency for Americans (Tickle- 
Degnen & Rosenthal, 1990; Yang, 1995). Understanding and acknowledging the potential differences caused 
by culture, familiarity, and their interaction will not only help researchers develop and evaluate systems in 
a value-sensitive way, but also shed light on how relational closeness impact people’s collaboration. 

The present study examines how different cultural values and relational familiarity affect how 
people gather awareness information in a simulated work environment. We found that American and 
Chinese users did use the same system, OpenMessenger, differently in terms of frequency of gathering 
awareness information. Contrary to our expectation, no difference was found along the familiarity 


dimension. 


2 Background 


2.1 Awareness information 


Awareness information refers to any information about other’s presence and activities (Gross et al., 2005). 
It has an interesting two-sided nature. On one hand, studies have found that sharing and making use of it 
not only enhance group task performance (Weisband, 2002), but also strengthen members’ feeling of group 
identity (Huijnen, Ijsselsteijn, Markopoulos, & de Ruyter, 2004). Abstract display of this information helps 
other collaborators to time the initiation of interaction, to get the information they need while minimizing 
the interruption (Dabbish & Kraut, 2003, 2004). Tang and Birnholtz (D. Tang & Birnholtz, 2010) found 
sharing such information increases the social attraction between collaborators, partly because users can 
better explain the unresponsiveness of their partner. Reynolds et al.’s study found simply displaying the 
awareness information saves time for explicit communication and therefore is crucial for time-sensitive tasks 
(Reynolds, Birnholtz, & Lee, 2012). 

On the other hand, however, there are reasons that restrain people from conducting awareness 
checks on a need basis, despite of its advantages for task performance. Dabbish and Kraut (Dabbish & 
Kraut, 2004) found that awareness information is used to time the interruption only when the collaborators 
are motivated on a team basis. Birnholtz et al. (Birnholtz, Bi, & Fussell, 2012) noted that visibility of those 
awareness checks has an impact in that when people believe their awareness information gathering behavior 
can been seen by their collaborator, whose information is being gathered, they conduct such checks 
significantly less frequently than when they believe people cannot see them; they believed it was because 


they are concerned about the social appropriateness or possible negative consequences (e.g., annoyingness). 
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The visibility of awareness checks raises an important point that has emerged in recent discussion 
of this issue. As Tang (J. Tang, 2007) notes in his review, a key aspect of negotiating mutual availability 
is having a multi-staged “approach” that allows people to first check on others without necessarily 
interrupting them, and then gradually allow their interest in interaction to become more salient. Birnholtz 
et al. (Birnholtz et al., 2012) build on this, noting that there is potential utility in allowing those being 
checked on to see (and thus potentially respond to) all awareness checks. This, in principle, allows for a 
more natural negotiation of mutual attention in that all acts of gathering can be seen and responded to (see 
Birnholtz et al., 2012; Birnholtz, Schultz, Lepage, & Gutwin, 2011, for a much more detailed discussion of 
these issues). 

It is with this premise in mind that we use the OpenMessenger system in this study. We 
acknowledge that most commercial systems that provide awareness information (e.g., Skype, Google Chat, 
etc.) do not make all awareness checks visible and do not support a natural negotiation of mutual attention. 
We believe, however, that doing so could provide substantial benefit, and that the general lessons derived 
here are more broadly applicable as well. 

In the present study, where awareness checks are made visible to both parties, we describe below 
why people may also be concerned about the social or relational appropriateness of their awareness 
information gathering. However, to our knowledge, few studies have explored other reasons that account 
for the difference in such behaviors, and it is the goal of the present study to explore two of them: cultural 
background and familiarity. 


2.2 Cultural background: Task vs. relationship focus 


Being a notoriously broad umbrella term, culture or cultural background has raised a great deal of 
controversy in definition (Hofstede, 1991). While we acknowledge equating it to nations lacks granularity, 
further examining how it affects members of subgroups (e.g., race and ethnicity) will go beyond the scope 
of the present study. Since intercultural collaboration nowadays still predominantly refers to and operates 
on the national level (Bird & Osland, 2006), we decided to adopt the definition of culture as the set of 
values, norms, and customs shared by individuals from a particular country (Doney, Cannon, & Mullen, 
1998; Hofstede, 1991), and we are using the United States and China as the two cultural backgrounds for 
comparison in our study. 


American and Chinese cultures are different along many dimensions, one of which, and the most 
pertinent one to our study, is the emphasis each culture places on maintaining social relationships with 
collaborators versus completing the task at hand in the most efficient manner. Triandis (Triandis et al., 
1988) found people from individualistic cultures (e.g., United States, Canada) focus more on task efficiency 
than on relationship development and maintenance, and that when facing conflicts between task completion 
and interpersonal relationship issues, they tend to choose to complete the task rather than maintain their 
relationships. In contrast, members of collectivist cultures (e.g., China, Japan) prioritize relationship 
maintenance more than task efficiency. 

Cultural differences in emphasis on task vs. relationship affect multiple aspects of group work. For 
example, Hamid (Hamid, 1994) found less frequent communication about work and task among Chinese 
students than their New Zealand counterparts, who are more task-oriented. In the workplace, members of 
relationship-oriented cultures tend to focus more on nurturing social relations between co-workers than 
members of task-oriented cultures (Schuster & Copeland, 2006; Shell, 1999; Triandis, 1995; Triandis et al., 
1988). Hu and Jasper (Hu & Jasper, 2007) contend that even in a task situation, members of relationship- 
oriented cultures tend to be more sensitive than members of task-oriented cultures to social cues (e.g., the 
attitudes of people around them) (Ruble & Nakamura, 1972) and their task performance is more likely to 
be influenced by them (Krishna et al., 2004). 
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Taken together, these findings suggest that members of relationship-oriented cultures are more 
likely to consider the relational effect a certain behavior or conversation may have in workplace, whereas 
members of task-oriented cultures are more likely to consider the potential benefits of that behavior or 
conversation for getting the task done quickly and efficiently. 

Although we are aware of no study on how task versus relationship orientation directly influences 
awareness information gathering behavior, it is clear from the literature that when facing a tradeoff between 
improved task performance and potential damage to a social relationship, members of relationship-oriented 
cultures are more likely to consider the relational impact of their actions than their counterparts of task- 
oriented cultures. The case of gathering awareness information provides a useful instantiation of this 
tradeoff. On the one hand, awareness information can be useful for interruption timing and improving task 
performance (Dabbish & Kraut, 2003). On the other hand, the visibility of gathering this information can 
have social cost if it distracts or annoys a collaborator (Birnholtz, Gutwin, & Hawkey, 2007; Clement, 1994; 
Heath, Luff, & Sellen, 1995). 

In the present study, we compare awareness information gathering behaviors of American 
participants (who tend to be task-oriented) and of Chinese participants (who tend to be relationship- 
oriented). We hypothesize: 


H1: American participants will gather awareness information more frequently than Chinese 


participants. 


2.3 Familiarity 


Familiarity within social relationships refers to the degree to which people are close to and comfortable with 
each other (Little, 1965). People develop different levels of familiarity with others due to frequency of 
interaction (Whittaker, Frohlich, & Daly-Jones, 1994): close friends, friends, acquaintances, or complete 
strangers. Those categories are influential factors in verbal as well as nonverbal communication. 

In verbal communication, utterances toward a stranger tend to be more polite than those toward a 
friend (Gupta, Walker, & Romano, 2007). In contrast, Planalp and Benson found that conversation between 
close friends is characterized by more interruptions, criticism, and disagreement than acquaintances in their 
interaction (Planalp & Benson, 1992). 

In nonverbal communication, Whittaker et al. (Whittaker et al., 1994) also found that people who 
rated each other as more familiar had less formal communication and produced more interruptions; in their 
study, “interruption” meant to start an interaction without achieving shared attention. One way to interpret 
it would be that people who are more familiar with each other do not consider such interruption as it is for 
complete strangers. 

These findings are consistent with politeness theory, which centers on the concept of “face”, which 
refers to the image people have of themselves, and that they believe others have of them, with socially 
desirable attributes. Acts with the potential to harm this positive image are considered face-threatening 
acts (FTA) (Brown & Levinson, 1987). Politeness theory suggests that FTAs will be more likely to occur 
in closer relationships, because people who are more intimate with each other understand that they have 
more face concern for each other, and thus the behaviors are not seen as impolite or offensive (Culpeper, 
1996). Leech (Leech, 1983) even went further to argue in his Banter Principle that being impolite to each 
other is a way to cultivate intimacy; and the closer the relationship is, the less necessity there is to follow 
the principles of being polite. 

As noted above, the more familiar one is with someone, the less polite he/she is likely to be. In 
equal relationships (no party is more powerful than the other), FTAs like interruptions and criticism are 
more likely to frequent friends than strangers. In the context of awareness information gathering behavior, 
a previous study (Birnholtz et al., 2012) has found that when such gathering is visible to both parties in an 
interaction, people are less likely to do it than when it is invisible, since it might be seen as invasive or 
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interruptive and is therefore socially undesirable. Therefore, we argue that gathering awareness information 
about one’s collaborator, in the general sense, may be perceived as invasive and thus a potential FTA. 
Because FTAs are more likely to happen in people with closer relationship than complete strangers, we 
hypothesize that people will be more likely to gather awareness information about a friend than about a 


stranger. 


H2: Participants will gather awareness information more frequently about their friends than about 


strangers. 


Studies have also suggested that culture and familiarity level may have an interaction effect. For example, 
Gupta et al. (Gupta et al., 2007) found that British and Indians have different perceptions of politeness, in 
that Indians are more informal when talking to friends than British are. However, to our knowledge, there 
is no study addressing how these two factors interact in the United States and China. Therefore, we turned 
to literature that separately examines how Americans and Chinese interact with familiar and unfamiliar 
persons. 

For Americans, Tickle-Degnen and Rosenthal (Tickle-Degnen & Rosenthal, 1990) found strangers 
are more likely to be polite and positive in their conversations than friends, because the importance for 
positivity decreases as the friendship progresses. In other words, the more people are familiar with each 
other, the less they need to feel the “friendliness and caring” (p. 286) in each other. 

For Chinese, Yang (Yang, 1995) described three social categories of interpersonal relations: jiaren 
(family members), shuren (familiar persons, excluding family members), and shengren (acquaintances and 
strangers). The latter two categories differ from each other in terms of interaction principles: familiar persons 
interact on the basis of a combination of utilitarian and affectional concerns, whereas strangers or 
acquaintances’ major concern is gain and losses, especially when money is involved. 

In the case of gathering awareness information, we reason that people may differ along cultural 
lines in their hesitation to gather information about a friend vs. a stranger. Chinese, who value relationship 
cultivation, may be less willing to gather information about a stranger prior to cultivating a relationship 
with them. However, we are aware of no direct evidence supporting this argument, so alternative outcomes 
are also plausible. Therefore we asked: 


RQ1: Is there an interaction effect between culture and familiarity on the frequency of awareness 
checks? 


2.4 Effects of awareness information 


Apart from cultural background and familiarity, we also examined the effects of gathering awareness 
information on perceived task performance, social attraction, and annoyingness. 

The perceived task performance of oneself and one’s partner bears significance in collaboration. 
Like actual task performance, perceived performance is also an important indicator of effectiveness (Costa, 
2003); moreover, it is strongly and positively correlated with team satisfaction. Salanova (Salanova, Llorens, 
Cifre, Martinez, & Schaufeli, 2003) found that it not only moderates actual team performance but also 
builds individual well-being and reduces anxiety level. 

Sallnäs (Sallnaés, 2005) found that people perceive themselves doing better when the partner’s social 
presence increases. Since the awareness information provides information about social presence and 
activities, we hypothesized that: 


H8a: There will be a positive correlation between awareness checks one conducts and the rating of 


perceived task performance of their own. 


The same reasoning may also apply for their ratings of perceived performance of their partner: 
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H8b: There will be a positive correlation between awareness checks one conducts and the rating of 
perceived task performance of their partner. 


Although awareness information checking may be beneficial for task performance, doing it too frequently 
may be considered invasive or inappropriate to collaborators (Birnholtz et al., 2012). One way to measure 
these possible effects is to examine social attraction of the gatherer of the information. In this study, social 
attraction refers to the appreciation one evokes (Damian, Baur, & André, 2013; Simpson & Harris, 1994). 
Since previous studies have indicated awareness checks may be perceived to be invasive, we argue social 
attraction should generally be lower to partners who conduct more awareness checks. We thus hypothesized: 


H4: There will be a negative relationship between the number of awareness checks conducted by one’s 


partner and one’s social attraction to the partner. 


Awareness checks, being mutually visible, may affect the perceived annoyingness of oneself to one’s partner, 
and of one’s collaborator to oneself. Since awareness information checking is mutually visible in the current 
study, there is a chance that frequent checks, even without initiation of any conversation, will make oneself 
appear annoying to collaborators (Birnholtz et al., 2012). As the literatures about familiarity and impression 
management (Culpeper, 1996; Leech, 1983) suggest, acts may be considered differently depending on the 
relationship between the interactants. People are usually less polite to friends than to strangers, therefore 
something that is annoying for strangers may be considered less so when it is carried out toward friends. 
Therefore we hypothesized: 


H5a: There will be a positive relationship between the number of awareness checks one conducts and 
the extent to which one perceives oneself as annoying to one’s partner. 

H5b: The correlation between awareness checks conducts and perception of one’s own annoyingness 
will be stronger for stranger pairs than for friend pairs. 


We believe the same logic that familiarity influences the outcome of a certain behavior also applies when 
the participants rate their partner’s annoyingness. Therefore we hypothesized: 


H6a: There will be a positive relationship between a partner awareness checks and one’s rating of 
the partner’s annoyingness. 

H6b: The correlation between partner awareness checks and rating of partner annoyingness will be 
stronger for stranger pairs than for friend pairs. 


3 Method 


3.1 Participants 


Participants consisted of 66 students (30.30% male; 51.52% American, the rest are Chinese) at a large 
university in the northeastern United States. 

In the “friends” condition, the participants signed up for the study and were required to come to 
the experiment together with a friend, who shared the same cultural background with them. We asked the 
participants to rate how they feel psychologically close to the friends they brought, using a 7-point Likert 
scale question with 1 being the least closest and 7 being the most. Participants indicated they were quite 
close (M = 5.66, S.E. = 0.24). 

In the “strangers” condition, the experimenter partnered each participant with a stranger from the 
same culture, and made sure that they had not met prior to the experiment. To verify this manipulation, 
the experimenter asked whether they knew each other before coming to the lab and got all negative 
responses. Each student was paired with someone at the same education level (i.e. undergraduate with 
undergraduate, graduate with graduate) to ensure equal power status. 
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Each participant received a financial award of $5 for participation, and an additional award varying 
from $2 to $5, based on his or her individual task performance, detailed below. 


3.2 Equipment and software 


We used OpenMessenger (OM), a research prototype CMC tool that supports the mutual visibility of 
awareness information gathering (Birnholtz et al., 2011). Participants were seated at tables on opposite 
sides of the room. Each table had a Windows PC and a 20” LCD display. In addition, OM provides a 
peripherally projected display called the “OMNI Window” (see Figure 1), in which avatars for one’s contacts 
are projected on the wall surrounding the user. In our experiment, this display was projected on the wall 
just above the LCD display, with the projection about 5 feet wide and about 2 feet in front of the 
participant. Two avatars were displayed (see Figure 2): the bottom one represented the participant and the 
top one represented their partner. When one gathered awareness information about his/her partner, in the 
form of information about their progress on the task, the partner would see the gatherer’s avatar moving 
closer to his/her own from the top to bottom of the OMNI Window. 


Figure 1: OpenMessenger, with the OMNI window Figure 2: Setup of the current experiment, with 
projected peripherally on the wall avatars for the participant (bottom) and 
partner (top) both shown peripherally 


3.3 Tasks and materials 


The task was designed to replicate a real-world scenario in which a person has both shared and individual 
tasks with complex interdependencies, and in which incentives for shared and individual tasks are mixed. 
Participants collaborated on completing five large jigsaw puzzles on the computers, each of which was 
divided into six smaller sections that were completed one by one. Each person was individually responsible 
for three sections. Each puzzle section was solved in a “puzzle window” (see Figure 3), which consisted of 
the puzzle itself and a space for the puzzle pieces. Participants solved the puzzle by dragging the pieces into 
the puzzle area and snapping them into the grid. 
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=, Jigsaw Puzzle Section 


Drag and dop images trom the: poo! 
to a puzzle location on the left 


Figure 3: Puzzle interface, with the puzzle section (left) and the pile of pieces (right 


To create interdependency in the task, participants were not permitted to move on to their next puzzle 
section until their partner had finished his/her own current section. Participants who completed a section 
faster than their partner, however, did have the opportunity to earn additional points — and a potential 
cash bonus for themselves — by playing “shape games” individually. 

Every time the participants finished a puzzle section before their partner, they were offered the 
opportunity to play a shape game via a dialog box with options of “Yes” and “No”. If they chose “Yes”, 
they proceeded to the shape game; if they chose “No”, no points would be deducted, but they would not be 
able to play any shape games until after the next time they finished a puzzle section before their partner 
did. 

In playing the shape games, participants were shown a sequence of ordinary objects for 5 seconds. 
They then had to identify the displayed sequence from four options after the original images had disappeared 
(see Figure 4). The shape games were optional. For each shape game successfully completed, the participant 
got 1 point; but if their partner finished the jigsaw puzzle section while the participant was still playing a 
shape game, the participant would lose 5 points. Points were used to determine the financial reward received 
at the end of the experiment. In this way, there was a clear incentive to use awareness information to 
estimate available time for shape games. 


Figure 4: Shape game interfaces, with the initial sequence (left) and the set of choices (right) 


To do so, participants could use OM to view the number of puzzle pieces their partner had correctly placed. 
Effectively, this was an indicator of how far along one’s partner was on the puzzle task and how much time 
the participant had to play shape games. By hovering the mouse cursor over the avatar on top, participants 
would see the number of puzzle pieces correctly placed by the partner. This information was used to help 
the participant decide whether they had enough time to play shape games (see Figure 5). Participants in 
the shape game could gather this awareness information whenever they needed to determine whether he/she 
should do another shape game or not. 


433 


iConference 2014 Nanyi Bi et al. 


Figure 5: The projected awareness window including: (A) the partner's avatar, (B) the participant's 
avatar, (C) the number of correctly placed piece puzzles, and (D) the location of correctly placed puzzle 
pieces 


Two paper-based questionnaires were also administered. The pre-experiment questionnaire collected 
participants’ experience in IM usage. The post-experiment questionnaire asked about the participants’ 
workload, impression about the partner, individualism/collectivism, evaluation of self-performance, 
task/relationship orientation, and demographic information. 


3.4 Procedure 


After participants arrived at the lab, they completed a consent form and the pre-experiment questionnaire. 
Participants in the stranger conditions also filled in a simple profile note, which documented their name, 
gender, and birthplace (city, country). The experimenter then showed their notes to each other to give the 
participants some basic information about each other’s cultural background. 

Participants were seated in the two corners of a room, facing different directions and separated by 
dividers so they could not see each other. They wore noise-cancelling headphones so they did not hear 
ambient sounds. 

In the beginning of the experiment, participants watched a short instructional video in which they 
were introduced to the puzzle, shape game tasks and the financial incentive. They were told they would 
earn 1 point for each shape game they played correctly and that a wrong answer would mean a 1-point loss. 
Most importantly, if their partner finished their puzzle section while they were still playing a shape game, 
they would lose 5 points. The final total points would be used to determine cash bonuses, with more points 
meaning a larger cash bonus. 

This scoring scheme was structured to motivate participants’ awareness information gathering 
behaviors, because in order to gain more points and avoid losses they would need to get information about 
their partner’s progress on the puzzles. In other words, it was to their advantage to know how far along 
their partner was on his/her puzzle task, so they could estimate whether or not there is enough time to 
play shape games to earn points, without being cut off and thereby losing more. They were instructed to 
use OM to collect such information. 

After the instructions but before starting the study, participants completed one simple puzzle 
section and one shape game, including a practice session to gather awareness information from their partner, 


to familiarize themselves with the game rules and the OM system. 
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3.5 Measures 


Data gathered during the experiment consisted of surveys and log data from the system, which contained 
both the puzzle and shape game durations and scores, as well as the awareness checking, from which we 


extracted the number of awareness checks per puzzle section. 


3.5.1 Awareness checks 


Counts of awareness checks per puzzle section were extracted from the OM log files. Because the raw 
number of checks was correlated with the amount of time available to participants for these checks, we first 
used the logs to determine how much time was available. Participants only had time for shape games, and 
thus only had reason to perform the awareness checks, if they finished their section of the puzzle before 
their partner did. This means that only one of the two participants could engage in shape games in a given 
puzzle section. We determined which partner had time for shape games, and how much time was available, 
for each section. We then used the total amount of time available to the participant across all puzzles and 
sections as the denominator to compute our rate of awareness checks. The resulting value was positively 
skewed so we used a log transformation prior to analysis. 


3.5.2 Actual task performance 


The shape games were used to rate the actual task performance. Each attempt at a shape game was scored, 
and total score per puzzle section were extracted from the logs. To correct for the amount of time available 
for playing shape games, we divided people’s total shape game score by the number of the games played. 


3.5.3. Perceived game performance 


Participants’ ratings of perceived task performance on four semantic differential scales (good/bad, fast/slow, 
unproductive/productive, efficient/ inefficient), for both themselves and their partner, were averaged 
(Cronbach’s a for ratings of oneself = .88; Cronbach’s a for ratings of partner = . 87). 


3.5.4 Social attraction 


We adopted the sub-scale from the Interpersonal Attraction Scale (McCroskey & McCain, 1974) to measure 
the social attraction of participants. Scores on five questions pertaining to participants’ desire to interact 
socially with their partners (e.g., “I would like to have a friendly chat with him/her”) were averaged to 
create a social attraction measure (Cronbach’s a = .87). 


3.5.5 Annoyingness 


Participants’ ratings of the negative effects of their own awareness information checking behavior on their 
partners on three semantic differential scales (annoying/calming, unintrusive/intrusive, upsetting/pleasing), 
as well as their ratings of partner’s behaviors were averaged, respectively (Cronbach’s a for ratings of oneself 
= .72; Cronbach’s a for ratings of partner = .66). 


3.6 Statistical analyses 


All analyses were run using the mixed model procedure in JMP, to take into account the fact that 
participants were nested in dyads. Note that Mixed Model analyses can result in non-integer degrees of 
freedom (Littell, Milliken, Stroup, & Wolfinger, 1996). 


4 Findings 

We present the results in four parts. First we examine how culture and familiarity affect awareness 
information gathering behavior in terms of awareness check frequency. Then we look at how awareness 
information checks correlate with the ratings of participants’ perceived task performance. Third, we examine 
how the checks correlate with partner’s social attraction. Finally we examine how they influence the 


participants’ ratings of annoyingness of their own and their partners. 
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4.1 Manipulation check 


One key underlying assumption for our design is that awareness information gathering actually does improve 
task performance. To test this manipulation, we examined the relationship between awareness check 
frequency and participants’ performance and found a positive correlation between awareness check 
frequency and task performance in the shape games (r (64) = 0.67, p < .0001, n = 66). Thus, our 
manipulation was successful in that regard. 


4.2 Effect of cultural background and familiarity 


Our first set of hypotheses examines the main and interaction effects of participants’ cultural background 
and their familiarity with collaborator on the frequency of awareness information gathering behavior. 

We ran a mixed model analysis of variance on the natural log of awareness check frequency, since 
this variable was positively skewed. We used participants’ cultural background, familiarity of partners, and 
the interaction of the two as fixed effects; and pairs as well as individuals as random effects. The results 
are presented in Table 1. 


Dependent variable: Log of Awareness Checks 


Variables B S.E. DFDen p 
Intercept 2.92 0.11 29 TAR 
Individual cult 
ndividual culture i gii 3h a 
(A) 
Familiarity (S) -0.12 0.11 29 


Individual culture 
(A) x familiarity 0.04 0.11 29 


(S) 


Table 1: Effects of culture and familiarity on the log of awareness checks frequency. Notes: p-values: ** p 
<= .01, *** p <= .001, A=American, S=Stranger 


H1 predicted that American participants would perform awareness checks more often than Chinese 
participants. As shown in Table 1, the results showed that there is a main effect of culture (F [1, 29] = 
8.50, p < .01). Consistent with H1, this indicates that all the American participants from both friends and 
stranger conditions (M = 3.24, S.E. = 0.15) conducted awareness checks more frequently than Chinese 
participants (M = 2.61, S.E. = 0.16) (see Figure 6). 
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Figure 6: Mean numbers of awareness checks as a function of familiarity and cultural background 


H2 predicted that participants would gather awareness information more often from friends than from 
strangers. However, we did not find a significant effect of the familiarity between partners (F [1, 29] = 1.24, 
n.s.), nor an interaction between culture and familiarity on the number of awareness checks (F [1, 29] = 
0.14, n.s.). Therefore H2 is not supported. 


4.3 Effects of awareness information gathering behavior on perceived task performance 


Our second set of hypotheses looks at how the frequency of awareness checks affects the participants’ 
perceptions of their own and the collaborator’s task performance. 

H8a predicted a positive correlation between the frequency of awareness checks and participants’ 
ratings of their perceived task performance of their own. Consistent with this hypothesis, there was a 
significant positive correlation between the two (r [64] = 0.35, p < .01). 

H8b predicted a positive correlation between the frequency of awareness checks one’s partner 
conducted and participant’s ratings of the perceived task performance of their partner. We found a 
significant positive correlation between the two (r [64] = 0.32, p < .01), Therefore, both H3a and H3b were 
supported, meaning increased awareness check frequency is related with better perceived task performance, 
for both the participants themselves and their partners. 


4.4 Effect of awareness information gathering behavior on social attraction 


Our next hypothesis explores how the frequency of awareness checks affects the participants’ perceptions 
of their collaborator’s social attraction. 

Contrary to H4, which predicts a negative correlation, we did not find anything significant between 
awareness checks and social attraction (r [64] = 0.14, n.s.). On the surface, this seems to provide conflicting 
evidence against previous study (D. Tang & Birnholtz, 2010); we will discuss the reasons in the discussion 


section. 


4.5 Effect of awareness information gathering behavior on annoyingness 


The last set of hypotheses examines how the frequency of awareness checks affects the participants’ 
perceptions of their own and the collaborator’s annoyingness. We also included familiarity to see whether 
it interacts with awareness checks frequency and influence such ratings. 

H5a proposed that when people conduct more awareness checks, they would think they are more 
annoying to their partner. This hypothesis was supported (r [62] = 0.28, p = .02). 
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H5b predicted that people would think that awareness checks were more annoying to partners who 
were strangers as opposed to friends. Consistent with this hypothesis, there was a significant positive 
correlation between the awareness checks and perceived annoyingness in stranger pairs (r [32] = 0.41, p = 
.02); but for friend-pairs this correlation was not significant (r [30] = 0.09, n.s.). Therefore, H5b was 
supported. 

H6a and H6b examined this relationship on the partner’s side. H6a predicted that the frequency of 
awareness checks received from a partner would correlate positively with the perceived annoyingness of 
one’s partner, but this was not supported (r [64] = 0.10, n.s.). H6b predicted that the correlation would be 
stronger for stranger pairs, but this was not supported, either, as neither correlation is significant (for 
stranger pairs (r [32] = 0.16, n.s; for friend-pairs (r [30] = 0.05, n.s.). 


5 Discussion 


We have presented an experimental examination of how cultural background and familiarity affect 
awareness checking, and how awareness checking, in turn, influences perceptions of one’s own and a 
partner’s task performance, annoyingness, and partner’s social attraction. In this section, we will discuss 
the implications of these findings for theory and design, as well as limitations and future research direction. 


5.1 Theoretical implications 


As we hypothesized, American participants performed awareness checks more frequently than their Chinese 
counterparts. This is consistent with the theoretical view that American culture is more task-oriented and 
less relationship-oriented than Chinese culture. 

Proposed in the last century, Hofstede’s seminal cultural dimensions (Hofstede, 1991), especially 
the division of individualism/collectivism, have been suffering increasing criticism in the recent years, largely 
because many believe equating culture to countries does not adequately consider national heterogeneity 
(e.g., McSweeney, 2002), and some believe that culture is such a constantly evolving concept that it has 
already been reshaped by technology use and increased international interaction (Irani, Vertesi, Dourish, 
Philip, & Grinter, 2010). While it is true that globalization and economic transformation have been 
influencing some values in Chinese culture, our study shows that Chinese are still, as Hofstede and Triandis 
predicted, more concerned about relationship maintenance rather than their American counterparts, who 
are more task-oriented. 

This also shed light on the value-sensitive design (VSD) approach, where accounts of human values 
should be reflected throughout the design process (Borning & Muller, 2012). VSD mainly treats value as a 
universal concept, but our study, as well as many others, has shown that at least the focus on task 
completion and relationship development still differ greatly across cultures, and they are playing an 
important role in how people use communication technology. Using the awareness information system, 
people from a task-oriented culture like USA (e.g., Canada and the North European countries) tend to 
gather awareness information more frequently than people from a relationship-oriented culture like China 
(e.g., Japan and Korea). Such a difference is not due to one party’s lack of task commitment or negative 
social intentions, but more because of the values and norms that are deeply embedded in their particular 
society. Therefore, when incorporating values in the design process, as Borning and Muller (Borning & 
Muller, 2012) also suggested, designers should take a pluralistic view of value, admitting culture to be an 
important variation in it. 

Contrary to our expectations, awareness checks did not differ in frequency in stranger and friend 
pairs. While it is possible that familiarity has no effect on awareness checking behavior, two alternative 
explanations are also probable here. Firstly, as Culpeper (Culpeper, 1996) puts it, in terms of politeness, 
familiarity is entangled with intimacy and similarity. A simple dichotomous operationalization of friends 
vs. strangers might have been omitting not only the nuance in the spectrum of familiarity, but also in the 
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dimensions of intimacy or similarity. Relating to this first explanation, it is also possible that our 
manipulation of strangers and friends pairs was not as rigorous as desired. “Friends” may have been 
acquaintances from class, for example, and “strangers” might expect to encounter one another in the 
classroom, especially for the Chinese participants, who are a minority group in this university. It would be 
useful to replicate the study using stranger pairs with no possible prior and anticipated future interaction. 

We did find, however, that people thought of themselves as more annoying when they conducted 
awareness checks frequently toward strangers than toward someone they know. This finding supports the 
idea that familiarity is an important factor when determining the appropriateness of certain behavior. 

One puzzling finding is the lack of correlation between awareness checks and assessments of social 
attraction, either overall or in the American or Chinese samples separately. There are several possible 
explanations for this. It could be, for example, that awareness checking actually had no effect on social 
attraction or that the measurement scale was not sensitive enough to capture any effects that did occur. As 
awareness checking has influenced social attraction in prior work (Huijnen et al., 2004; D. Tang & Birnholtz, 
2010), this merits further study in an intercultural context. 


5.2 Implications for design 


These findings have several implications for designers of collaboration systems to consider. The overarching 
theme is that it is important to consider the relational context of action. 

The finding that Chinese participants conducted fewer awareness checks than Americans — even 
when it would have benefited them to do so in terms of task performance — suggests two design possibilities. 
First, it suggests that evaluating awareness tools only on the extent to which they improve task performance, 
a common CSCW measure in past studies (Dabbish & Kraut, 2003, 2004), may not be as appropriate or 
useful in more relationship-focused contexts. Designers might consider ways to measure the perceived effects 
of such tools on social relationships, and to build features into these tools that allow for gathering 
information in relationally sensitive ways. One might, for example, allow for different types of virtual 
approaches (e.g., subtle vs. more obvious) with different contacts, or even provide users from relationship- 
focused culture with suggestions on when it might be ok to interrupt, or even, in some cases, encourage 
them to conduct awareness checks on somebody from another culture in order to proceed on the task. 

Relating to this, the contrast between the results about annoyingness of participants themselves 
and their partners is worth mentioning, too. Positive correlations were found between the awareness checks 
frequency and participants’ ratings of their own annoyingness, but not between the checks and their 
partner’s annoyingness. This suggests that people have different standards for themselves and for others in 
judging the appropriateness of gathering awareness information, and they are not necessarily aware of it. It 
provides further support for integrating an encouraging mechanism when designing awareness information 
system, given that increased awareness checks come with better task performance, both actual and 
perceived, as our results suggest. Such an encouraging mechanism can range from setting the rules explicitly 
to visually presenting the contacts in a friendly and equal way. 

With regard to working with friends and strangers, it is further noteworthy from a design standpoint 
that, for participants working with strangers, there was a positive relationship between awareness checks 
and one’s own perceived annoyingness. This highlights the tradeoffs involved in building systems that allow 
for improved task performance, but in which some valuable behaviors have possible negative social 
consequences. This suggests that, when collaborators are likely to be working with an unfamiliar partner, 
it may be useful for them, before and during their collaboration, to discuss norms around interruption and 
possible annoyances, and when these are acceptable. 

It may further be useful to consider these tradeoffs in the design of interfaces that are sensitive to 
whether people are working with known collaborators or strangers. Using histories from prior collaborations, 
for example, systems might help people by visualizing their behavior against past groups or personal history 
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(e.g., Leshed, Hancock, Cosley, McLeod, & Gay, 2007) to gauge whether or not they are actually being 
annoying. This could help those who may be overly concerned with annoying their partner be more 


productive, in a relationally sensitive way. 


5.3 Limitations and future directions 


Our study aimed to replicate a real-world situation in which one had to make the choice between enhancing 
one’s productivity and nurturing relationships. As such, we engineered the task in a way that there was a 
significant performance advantage to regularly performing awareness checks. Although we found that the 
Americans and Chinese performed the awareness checks differently, we still lack a fine-grained 
understanding of where this difference originates. Literatures have suggested a few possible but untested 
answers. For example, Nakane (Nakane, 2007) found Japanese students usually avoid interrupting others 
as a face-saving strategy; Chinese are quite similar in this respect. In the future, we hope to follow up with 
interviews that probe the reasons that Americans and Chinese participants do or do not perform awareness 
checks; this will also further help us in avoiding the trap of oversimplified application of intercultural 
theories. 

A possible confounding factor in our study is that Chinese participants in the stranger pairs, 
although with no interaction history just as their American counterparts, might still feel closer toward each 
other psychologically, because the population of Chinese students at our university is much smaller than 
that of Americans. Being a member of the minority group activates the salience of their cultural identity 
and makes them feel closer toward each other than two American strangers might have felt in the same 
setting (Schmitt, Spears, & Branscombe, 2003). In addition, they may have higher expectation for future 
interaction, which could also have influenced their behaviors (Gibbs, Ellison, & Heino, 2006; Walther, 1994). 

Finally, awareness information gathering is just one example of many behaviors in workplace that 
are, on one hand, beneficial for one’s own or the group’s overall task performance, but on the other hand, 
present a potential threat to interpersonal relationships between collaborators. Being aggressive, or being 
“pushy”, for example, may enhance the team effectiveness, but at the same time reduce the social attraction 
of team members. Future research will be needed to test the generalizability of our findings to these similar 


kinds of work behaviors. 


6 Conclusion 


In summary, this study explored how awareness information gathering behavior may be influenced by one’s 
own cultural background and familiarity with one’s partner. It suggests that members of relationship- 
oriented cultures (e.g.: China) conduct awareness checks less frequently than members of task-oriented 
culture (e.g., the States), probably due to the fact that task-oriented cultures legitimize the task-related 
behaviors, such as awareness information gathering, more so than relationship-oriented cultures. The study 
also examined the effects of awareness checking behaviors on people’s perceptions of themselves and their 


partner’s task performance and annoyingness, as well as partner’s social attraction. 
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Abstract 

In this within subjects study, we explored the role of trait affect, personality, and culture on an 
individual’s information seeking behavior about the Edward Snowden case. We also considered how these 
factors may affect an individual’s perception of risks related to Snowden’s actions. We used Amazon’s 
Mechanical Turk to conduct two surveys five weeks apart with respondents in both India and the U.S. 
After accounting for differences in age, education, and gender, early findings suggest that trait affect and 
personality are associated with how people acquire and understand information as well as the information 
sources they choose to use. We also found that culture played a significant role in shaping how our 
respondents perceived the Snowden case and the implications of risk associated with his actions. Since 
our study is explorative and our respondent sample was limited by our survey method, these findings 
warrant further analyses. 
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1 Introduction 


Edward Snowden, a former Booz Allen Hamilton contractor and National Security Agency (NSA) employee, 
leapt into the public eye on June 5, 2013. In a series of exposé articles, The Guardian shared Snowden’s 
story about U.S. intelligence programs PRISM and XKeyscore, detailing how the government had been 
collecting both telephone metadata and Internet records as a part of their surveillance. The Guardian’s 
timeline of Snowden’s release of information he previously swore to keep secret reads like a spy novel: four 
laptops used to gain access to highly classified materials; the appearance of a man carrying a Rubik’s cube; 
a week’s worth of interviews in a Kowloon hotel room; secret court orders; indictments of U.S. Internet 
giants; and statements from world leaders (Gidda, 2013). As this is being written, most people who have 
access to news sources know of his story, and many have formed opinions about his actions in breaking 
down walls of secrecy. For example, Snowden’s Wikipedia article has been edited more than 3,500 times 
since user Mboverload created it on June 9, 2013, and as of the writing of this paper, it has been viewed 
more than 3,513,600 times. Not surprisingly, it was also marked for deletion on June 10, 2013, and it 
continues to generate a great deal of debate: should Edward Snowden be classified as a dissident and whistle- 
blower or a traitor? 

This question begs many others, and how a person answers may depend not only upon what one knows, 
but also upon how one has come to what one knows. In this exploratory study, we examine how individual 
factors, information sources, and cultural values may be associated with how persons view Snowden and his 
behavior. 
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2 Background 


In this section, we first discuss affect, the type of affect under investigation in the current study, and the 
role it has on risk perceptions. This is followed by a discussion of personality dimensions and the role they 
may play in how individuals seek out information. 


2.1 Affect 


Affect, as a non-cognitive aspect of information seeking and sense-making, can influence perception and 
judgment. In this study, we focus on trait affect, a persistent type of affect. 


2.1.1 Affect, Mood, and Emotion 


Within the literature, affect has come to mean several different things and has often been used 
interchangeably with mood and emotion (Ekman & Davidson, 1994; Isen, 1984; N Schwarz & Clore, 1996; 
Waters, 2008). This is understandable in one respect since these are all interrelated concepts, but doing so 
makes it difficult to compare and judge the validity of studies. 

For purposes of this research, we distinguish three dimensions of affect. Emotion is characterized 
as a generally short-lived and intense reaction to an event or stimulus, whereas mood is longer-lasting and 
milder in degree (Isen, 1984). Both terms represent a type of affect and can be classified as affective states 
(Waters, 2008; Watson & Tellegen, 1985). Affective states include: fear, sadness, guilt, hostility, shyness, 
fatigue, surprise, joviality, self-assurance, attentiveness, and serenity (Watson & Clark, 1994). However, 
these dimensions represent only states of the broader concept of affect. 

State affect fluctuates over time and varies in intensity (Grés, Antony, Simms, & McCabe, 2007; 
Watson, Clark, & Tellegen, 1988). Emotion, a short-lived type of affect, will generally vary considerably 
over relatively short time periods. Emotion(s) may ultimately become mood depending on the intensity, 
frequency, and overall context of the experienced emotion(s). 

In contrast to these affective states, trait affect represents a more stable and generally life-long type 
of affect (Grés et al., 2007; Watson et al., 1988). In many respects, it can be considered part of one’s 
personality. In fact, research has supported the close relationship between trait affect and personality traits 
(Watson et al., 1988; Watson & Tellegen, 1985). Similar to personality, trait affect changes very little over 
time. One way to further conceptualize the difference between them is to think of trait affect as the baseline 
for state affect. An individual with a generally positive trait affect is more likely to have a positive mood 
and experience emotions that are more positive. 


2.1.2 Affect in Risk Perceptions 


Affect influences or alters how individuals perceive things. These altered perceptions have an effect on the 
decisions people make (Curry & Youngblade, 2006; Isen, 1984; E. J. Johnson & Tversky, 1983; Smith & 
Kirby, 2001; Waters, 2008). While there is a lack of consensus on the specific mechanisms by which affect 
influences risk decisions, there is nonetheless general agreement that this influence does exist (Bower, 1981; 
Clore, Gasper, & Garvin, 2001; Finucane, Alhakami, Slovic, & Johnson, 2000; J. Forgas, 1995, 2008; E. J. 
Johnson & Tversky, 1983; Kahneman, Slovic, & Tversky, 1982; Norbert Schwarz & Clore, 2003; Slovic, 
Finucane, Peters, & MacGregor, 2007). One of the primary manners in which affect influences risk decisions 
is by the effect it has on how individuals perceive risk. 

The primary mechanism through which affect influences risk perceptions is the optimistic bias 
(Borkenau & Mauer, 2006; Helweg-Larsen & Shepperd, 2001; Lerner & Keltner, 2001; Rhee, Ryu, & Kim, 
2005; Waters, 2008). Basically, those with a greater positive affect (and/or lower negative affect) will make 
more optimistic judgments related to risk than those with a higher negative affect (and/or lower positive 
affect). This is explained in part by the priming mechanism of affect. A few studies serve to further illustrate 
this bias. 
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In an experiment involving a real risk-taking task using betting chips, Isen and Patrick (1983) found 
that participants in the group with positive affect were more likely to engage in low-risk behavior by betting 
more than those in the neutral affect group. Once the level of risk increased, however, participants in the 
positive affect group were less likely to engage in the risk task compared to the other group (p. 199). The 
authors noted that these findings are consistent with earlier research suggesting that those who feel good 
will behave in a manner that preserves that feeling (see Isen & Simmonds, 1978). Their second experiment 
had findings contrary to this one, but this may have been due to the hypothetical nature of the risk scenario 
in the second experiment compared to the first experiment that involved real betting (p. 200). 

Other research using valence-based approaches have generally been consistent with the findings of 
the first experiment (Isen & Geva, 1987; Isen, Nygren, & Ashby, 1988). Additionally, research that has 
gone beyond valence-based approaches have found that specific emotions and dimensions of emotions (i.e., 
certainty) impact likelihood estimates as well (DeSteno, Petty, Wegener, & Rucker, 2000; Druckman & 
McDermott, 2008). Thus, affect has the potential to influence risk perceptions in several different ways. 
The formation of risk perceptions are important given their role in decision making. 


2.1.3 Affect as Positive and Negative 


The predominant approaches taken in conceptualizing affect have been valence-based. This includes affect 
as either positive or negative on a bipolar continuum (E. J. Johnson & Tversky, 1983), and positive affect 
and negative affect as two distinct dimensions (George, 1989; Watson et al., 1988; Watson & Tellegen, 
1985). The former approach has largely been replaced by the latter in recent years due to its higher degree 
of convergent and discriminant validity (Watson & Clark, 1997). 

Positive affect is related to the frequency of pleasant events and satisfaction, whereas negative 
affect is related to stress and poor coping (Watson et al., 1988). An individual with high positive affect does 
not necessarily have low negative affect and vice versa as they are largely independent dimensions. Thus, 
it is possible for an individual to have high positive affect and high negative affect, simultaneously. 

In this study, we operationalize trait affect as two distinct constructs — trait positive affect and trait 
negative affect — while also acknowledging that there are other intricacies of trait affect that cannot be 
fully captured by these two constructs. 


2.2 Personality 


Since Carl Jung’s theory about personality was published in 1921, researchers have continued to rely on, 
investigate, and question his construct of extraversion/introversion (E/I) (Carrigan, 1960) and the role of 
personality in influencing individual differences. Jung was not the first psychologist to recognize personality 
types. Before Jung, both Jordan (1890) and Gross (1902) had explored psychological theories based on type 
(Hildebrand, 1958). Jung, however, is a definitive personality type theorist, and his concept of 
extraversion/introversion as an innate yet fluid continuum persists. 

Jung (1923) defined the two psychological types as being differentiated by the direction of their 
interests. He believed the extraverted individual is one who is oriented toward the external object; whereas, 
the introvert turns away from the external object and is oriented toward the inner self. Jung also held that 
the E/I construct is compensatory and that the conscious and unconscious balance one another. Therefore, 
extraversion is defined by an outward-facing disposition; whereas, introversion is defined by an inward- 
facing disposition, but the two are not mutually exclusive. In addition to identifying the E/I construct, 
Jung proposed combinations of functional types: thinking; feeling; sensing; and intuiting. 

Today, the most popular measure of personality type is the Meyer-Briggs Type Indicator (MBTI), 
which is based on Jung’s personality theories. Although popular, the MBTI is not considered as accurate 
or predictive as the Big Five, an instrument based on work done by Tupes and Christal in the 1960s and 
revised by several different groups of researchers since (Tupes & Christal, 1961). Like the MBTI, the Big 


Five relies on an individual’s responses to a series of statements to determine an individual’s degree of E/I 
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as well as to measure other personality traits such as agreeableness, conscientiousness, neuroticism, and 
openness. In this study, participants responded to The Big Five Inventory, which is based on the Big Five 
(Benet-Martinez & John, 1998; John, Donahue, & Kentle, 1991; John, Naumann, & Soto, 2008). 


2.2.1 Biological Basis of Personality 


Personality is almost always measured via self-report. However, like Jung, most researchers believe 
individuals are hard-wired—biologically destined — to be more or less extraverted/introverted. If an 
individual’s general placement along the E/I continuum is inborn, then personality may be a more dominant 
trait than other demographic factors often considered in social science research. 

In 1967, Eysenck published early research about personality, citing cortical activity as the most 
salient factor in explaining the differences between extroverts and introverts (Eysenck & Eysenck, 1967a, 
1967b). His work suggested introverts have more cortical activity and are, therefore, more sensitive to 
external stimulation; whereas extroverts have less cortical activity and, therefore, seek external stimulation. 

Building on Eysenck’s research, scientists have investigated physiological and neurological 
connections to personality type and environmental stimulation by mapping physical reactions as diverse as 
amounts of salivation (Corcoran, 1964); eye movements (Gray, 1970); skin conductance (Fowles, Roberts, 
& Nagel, 1977); caffeine induced arousal and its effects on verbal performance (Gilliland, 1980); cerebral 
blood flow (D. L. Johnson et al., 1999); electroencephalograms (EEG) and empathy (Gale, Edwards, Morris, 
Moore, & Forrester, 2001); and brain activity in individuals with high sensory processing sensitivity 
(Jagiellowicz et al., 2011). These studies indicate personality has a biological basis and may be a dominant, 
constant set of traits that contribute to individual differences. 


2.2.2 Personality, Information Seeking Behavior, and Sense-Making 


If personality is a biological, dominant, and relatively constant set of traits that shapes an individual’s 
perception, then personality may also contribute to how individuals create personal information frameworks, 
how they seek and make sense of information. A host of researchers have investigated how situational 
differences shape information seeking behavior and information management strategies, and several 
researchers (Miller & Jablin, 1991; Tidwell & Sias, 2005) have begun to consider how individual differences, 
including personality traits, contribute to information seeking and sense-making. 

Citing Miller & Jablin (1991) as an exception, Tidwell & Sias (2005) write, “robust and stable 
personality traits are generally ignored in information-seeking research” (p. 54). However, Heinstrém (2005) 
found that among university students completing their Master’s theses, extroverted students tended to seek 
information by engaging in broad scanning versus fast surfing and deep diving and while Heinstrém 
concludes that personality alone does not determine information-seeking behavior, she notes that it does 
create boundaries for how an individual seeks information (p. 244). Additionally, Heaton and Kruglanski 
(1991) observed that, when time constraints are involved, introverts may feel a need for cognitive closure 
and be less likely to process conflicting information once they have made decisions. They may also be more 
likely to show negative affect toward people who disagree with their opinions (p. 165). 

Other studies have considered the connection between personality traits and general information 
behavior (Heinstrém, 2003) and the consumption of political information (Gerber, Huber, Doherty, & 
Dowling, 2011). Still, other scholars have pursued research regarding personality and different uses of the 
Internet, including both social and information gathering motivations (Amiel & Sargent, 2004; Hamburger 
& Ben-Artzi, 2000; Hills & Argyle, 2003; Kraut et al., 2002). This study continues the exploration of 
personality traits and information seeking, including the Internet as a key information source. 


3 Methods 


This exploratory study examines how individuals acquire information on a specific piece of content (the 
Snowden case) and how the use of different sources may be associated with trait affect and personality type, 
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as well as their perceptions of risk as it relates to this specific piece of content. Because the Snowden case 
arguably is of global interest and dynamic, we explore if US and non-US responses differ and if responses 
change in a 5-week period (July to August). 


3.1 Survey Development 


The survey instrument for both time periods included questions about the use of different information 
sources for learning about the Edward Snowden situation. The questions asked about the use of 1) Blogs; 
2) Online social media discussions; 3) Search engine news; 4) Online news services; 5) Television shows; 6) 
Personal discussions and email exchanges, and 7) Newspapers (including online versions). Responses were 
in the form of an anchored five-point Likert scale (1=not used at all; 2=used rarely in one week; 3=used at 
least weekly; 4=used daily, and 5=used several times per day). Both surveys asked for the respondents’ 
degree of agreement with statements related to security and to how Snowden might be viewed (e.g., as 
publicity-seeker; courageous whistle-blower; etc.). Finally, both surveys included demographic questions. 

The first survey included questions to measure cultural values, trait positive affect, and trait 
negative affect. The cultural values questions are from Hofstede’s Values Survey Model 2008 (1984, 2008). 

The primary measurement tool used to examine positive and negative affect has been the Positive 
and Negative Affect Schedule (PANAS) (Watson et al., 1988). PANAS has been the primary measurement 
tool in large part due to the extensive reliability testing and validation of this instrument (Waters, 2008). 
It has been used in a large number of studies to measure positive affect and negative affect and the 
relationship between these constructs and other constructs (Borkenau & Mauer, 2006; Curry & Youngblade, 
2006; Fedorikhin & Cole, 2004; Grindley, Zizzi, & Nasypany, 2008; Lu, Xie, & Zhang, 2013; Ntoumanis & 
Biddle, 1998; Treasure, Monson, & Lox, 1996; Vasey, Harbaugh, Mikolich, Firestone, & Bijttebier, 2013; 
Watson & Walker, 1996). 

The PANAS consists of 20 items with 2 scales: positive affect (10 items) and negative affect (10 
items) (Watson et al., 1988). Positive Affect consists of the descriptors active, alert, attentive, determined, 
enthusiastic, excited, inspired, interested, proud, and strong. Negative Affect consists of the descriptors 
afraid, scared, nervous, jittery, irritable, hostile, guilty, ashamed, upset, and distressed. The instrument 
itself has been validated with several different time instructions, including an instruction for participants 
to indicate how “you generally feel this way, that is, how you feel on the average” (Watson et al., 1988, p. 
1070). This time instruction is designed to measure trait affect. In particular, trait positive affect and trait 
negative affect, which is what the current research is concerned with measuring. Questions related to culture 
were also included, but are outside of the scope of the current analysis. 

The second survey was similar to the first, but in place of the questions on culture and affect, 
questions designed to measure personality types were included. The Big Five Inventory is a self-administered 
questionnaire designed to measure the five primary personality dimensions: extraversion, agreeableness, 
conscientiousness, neuroticism, and openness (Benet-Martinez & John, 1998; John et al., 1991, 2008). The 
participant must indicate their level of agreement with 44 phrases by using a five-point Likert scale. In the 
complex world of personality studies, the use of this instrument provides an acceptable compromise between 
length and validity. This is important given that some instruments may consist of more than 200 items 
(Costa & McCrae, 1985), which is not practical for most survey research, especially when other questions 
are included. 


3.2 Recruitment of Participants 


This study was conducted by recruiting participants using Amazon’s Mechanical Turk. The use of Amazon’s 
Mechanical Turk offers several advantages over other recruitment methods (e.g., students, word of mouth, 
flyers, and electronic postings). For example, turnaround time can be quite quick — all responses for each 
sample for the first survey in this particular study were collected in less than 24 hours. Furthermore, it is 
a cost-effective recruitment tool. In this study, participants were credited with 50 cents to their account for 
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their participation in the first survey and 65 cents for participation in a follow-up survey. Finally, the 
quality of responses obtained from participants using Amazon’s Mechanical Turk is generally high with 
only 4.17 percent of respondents failing a quality control question in one study, compared to 6.47 percent 
and 5.26 percent for participants from a university and Internet message board, respectively (Paolacci, 
Chandler, & Ipeirotis, 2010). The use of crowdsourcing has increased in popularity and acceptance for these 
reasons and others (Howe, 2006; Kittur, Chi, & Suh, 2008; Mahmoud, Baltrusaitis, & Robinson, 2012). 

However, it does have some drawbacks. For example, since the users are anonymous, quality control 
can be quite difficult. Some participants may be “malicious workers” that are simply trying to finish the 
task to receive payment (Ipeirotis, Provost, & Wang, 2010). While quality of responses is a concern using 
this method, it is far from unique to this recruitment method. Nonetheless, two quality control questions 
with only one correct answer that were simple and obvious were added to each survey to check for attention, 
quality, and engagement in the study. The 77 participants (65 from India; 15 from the U.S.) in the first 
survey that failed the quality control question had their data removed from further analysis. Ultimately, 
different motives and biases may enter the picture due to the use of this method of recruitment; however, 
it is a common problem for researchers in most recruitment methods employed. 


3.3 Survey Administration 


The first survey was administered at the beginning of July. The second survey was administered 
approximately five weeks later to individuals who had responded to the first survey. Of those that chose to 
accept the offer and began the survey, 93 percent of those from India completed it (N=214), compared to 
96 percent from the United States (N=172). Once we eliminated responses from participants that failed 
both quality control questions , we had a remaining sample size of 150 with a failure rate of 29.25 percent 
for India, with a significantly lower failure rate of 8.82 percent for U.S. participants (N=155). 

The second survey was administered approximately five weeks later and only participants that 
completed the first survey and passed the quality control questions were asked to participate in the second. 
Out of the original 155 U.S. participants, 110 completed the second survey. After cleaning the data by 
removing responses that failed the quality control questions, as well as those that provided incomplete or 
incorrect Mechanical Turk Worker IDs, the final sample size from the U.S. for this study is 101. 

The same process was followed for the participants from India. Out of the original 150 participants 
from India, 139 completed the second survey. After the data was cleaned, we ended up with a final sample 
size from India of 107. There was not a statistically significant difference in any of the demographic 
categories between those that successfully completed the first survey and those that successfully completed 
the second one. Additionally, all of the participants in this study completed both surveys. 

This data suggests a relatively high response rate for this type of methodology given that paper- 
based mail surveys generally have a response rate of under two percent (Kotulic & Clark, 2004) with 
Internet surveys generally even lower (Shih T.-H. & Xitao F., 2008). Although the participants from 
Amazon’s Mechanical Turk are likely more motivated than the general Internet population to complete 
such surveys. Regardless, the possibility of effects from non-response bias cannot be ruled out. Furthermore, 
in a study that includes personal questions related to perceptions of an individual embroiled in a national 
security matter, as well questions related to an individual’s affect and personality, we believe the web-based 
format of the survey is the best method to employ in order to minimize social desirability bias. The samples 
should not be considered representative of the populations of the two countries. Table 1 shows significant 
differences in age, gender, and educational levels of the US sample and population, but our sample provides 
a satisfactory range of ages and educational attainment levels for an exploratory study. 
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India Sample U.S. Sample U.S. Population! 
Sample Size 107 101 -- 
Age 
18-29 41.12% 37.62% 22.0% 
30-39 40.19% 30.69% 17.0% 
40-49 8.41% 14.85% 18.2% 
50-59 4.67% 9.90% 18.1% 
60+ 5.61% 6.93% 24.7% 
Gender 
Male 64.49% 63.37% 49.1% 
Female 35.51% 36.63% 50.9% 
Education 
Some High School .93% 0.99% 8.58% 
High School (or GED) .93% 8.91% 30.01% 
Some College 9.35% 32.67% 19.46% 
College Graduate 57.01% 47.52% 27.59% 
Master’s / Professional Degree 31.78% 6.93% 8.4% 
Doctorate -- 2.97% 1.36% 


Table 1: Age, Gender, and Educational Attainment Levels 


4 Findings and Discussion 
The primary purpose of this study is to examine possible relationships between trait affect and personality 
types with differences in information seeking behavior and risk perception. In particular, we are interested 
in determining if there is a statistically significant relationship between trait affect and personality types 
with the types of information sources individuals use to become informed on a specific news item, as well 
as their risk perceptions related to this news item. Likewise, we are also interested in determining if the 
type of information sources used varies over a short period of time — in this case, approximately five weeks. 
However, this exploratory study includes multiple facets to it, one of which is a comparison between 
participants from India with those from the U.S. In this section, we will first examine whether or not the 
information sources used, their risk perceptions, and both trait affect and personality types are related to 
the country in which the participants reside. Next, we will look at the information sources used and how 
they may be related to both trait affect and personality types. Then, we will explore their risk perceptions 
in a similar manner, including whether or not their risk perceptions are related to the information sources 
they use to learn about this news item. Finally, we will examine the extent to which the information sources 


used may have changed over a five-week time period. 


4.1 Differences between the U.S. and India 

The U.S. and India have unique cultures, customs, traditions, and challenges. In this section, we explore 
how some of these differences may inform their perceptions of Edward Snowden and his actions, information 
sources used to learn about the news story, as well as differences in the participants that completed the 
surveys with respect to trait affect and personality types. 


1 Source: U.S. Census Bureau, 2011/2012 
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First, we examine whether there are statistically significant differences between the U.S. 
participants and the participants from India with respect to awareness of the Edward Snowden situation 
and some overall perceptions of his actions. Participants from the U.S. are much more likely to be aware of 
the Snowden situation than those in India. Responses from those unaware of the Snowden case were not 
included in further analyses. While this may not be surprising, it is interesting how different the perceptions 
of Snowden are between the two groups of participants. In particular, participants from India feel more 
strongly that Snowden broke the laws of the U.S. and deserves to be tried in court and that he is a publicity 
seeker that hopes for personal gain from his actions. These results are presented in the table that follows. 


Aware of Edward 


Snowden Manas Broke the Laws Publicity Seeker 
$ i Individual 
Situation 
US. 92.1% M=3.65; M=3.01; M=2.32; 
s=1.165 s=1.214 s=1.217 
India 64.5% M=3.71; M=3.37; M=3.37; 
s=0.965 s=1.087 s=1.226 
t statistic 6.449** =- 2.702**+ 7.581**4 
** Significant at the .01 level (2-tailed) * Significant at the .05 level (2-tailed) 
-- Not Significant + Equal variances assumed 


Table 2: Differences between the U.S. and India - Awareness and Perceptions 


Second, we explore whether risk perceptions related to Snowden’s actions are different based on the country 
in which the participant resides. For most of the questions, we find a statistically significant difference 
between the two groups of participants. Specifically, for five of the six questions that showed a statistically 
significant difference, participants from India provided ratings indicating a greater level of agreement with 
the statement. This is true for questions considered more risk seeking (e.g., “make me feel personally more 
secure”), as well as those considered more risk averse (e.g., “have damaged U.S. national security”). 

The one exception is the question with the statement: “make me less confident in my government’s 
oversight of our nation’s security”. Whether these responses reflect the true opinions of participants or 
perhaps simply a difference in levels of baseline agreeableness is unclear. However, as we will discuss 
momentarily, the personality type that measures how agreeable one is, is statistically different for the two 
countries. The table that follows presents the findings on differences in risk perceptions. 


Damaged . Will Make 
Stronger Negatively j 
More Damaged All Less Little 
. i . and More Affects All : 
Secure U.S. Democratic Confident in . Difference 
. . Secure Democratic . 
Personally Security Nations’ Government E in our 
; U.S. Societies : 
Security Security 
US M=2.58; M=2.73; M=2.23; M=3.79; M=3.19; M=2.31; M=2.97; 
ys s=1.072 s=1.263 s=1.155 s=1.057 s=1.114 s=1.163 s=1.049 
Indi M=3.34; M=3.31; M=2.94; M=3.24; M=3.88; M=3.05; M=3.41; 
ndia 
s=1.042 s=1.208 s=1.202 s=1.156 s=1.011 s=1.230 s=0.985 
t statistic 6.274**4 4.125**+ 5.311**+ 4.291** -- 5.498**+ 3.786**+ 


** Significant at the .01 level (2-tailed) * Significant at the .05 level (2-tailed) 
-- Not Significant + Equal variances assumed 


Table 3: Differences between the U.S. and India - Risk Perceptions 
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Next, we look at differences in how information sources are used. Similar to the above finding, participants 
from India rate much higher with respect to the use of various information sources. This may be an artifact 
of the specific participants from India who have taken part in the study, but it may also be indicative of a 
greater level of engagement with news than their U.S. counterparts. Finally, as noted above it is possible 
that participants from India are simply more agreeable. We will explore this next. 


Onli g h Onli Personal 
f pas ; mies oar TV Discussions & 
Blogs Social Media Engine News . Newspapers 
. ; . Shows Email 
Discussion News Services 
Exchanges 
US M=1.88; M=2.02; M=2.40; M=2.60; M=2.40; M=1.69; M=2.08; 
~ s=1.001 s=1.054 s=1.180 s=1.166 s=1.208 s=1.011 s=1.070 
Indi M=2.57; M=3.29; M=3.56; M=3.30; M=3.40; M=2.69; M=3.82; 
ndia 
s=1.107 s=1.229 s=1.076 s=1.137 s=1.225 s=1.249 s=0.929 
t statistic 5.702** 9.610** 8.895**+4+  5.352**4  7.221**+ 7.581** 15.445** 
** Significant at the .01 level (2-tailed) * Significant at the .05 level (2-tailed) 
-- Not Significant + Equal variances assumed 


Table 4: Differences between the U.S. and India - Information Sources Used 


In our final examination of differences between participants from the U.S. and India, we can see that 
participants from India have significantly higher levels of trait positive affect. Generally speaking, 
individuals with higher levels of trait positive affect experience the world by embracing life with energy, 
have higher levels of confidence and enthusiasm, and enjoy the company of others (Watson, Clark, McIntyre, 
& Hamaker, 1992). Furthermore, these individuals are more likely to have higher levels of the personality 
type extraversion (Watson et al., 1992; Watson & Clark, 1994, 1997). Not surprisingly, we find that the 
participants from India do in fact have higher levels of this personality type. Likewise, they also show 
statistically significant higher levels of the personality type “agreeableness”. As noted earlier, this may 


account for the overall higher ratings provided by participants from India. This is something that should 
be explored in future research. 


TPA TNA BFLE BFLA BFI-C BFI-N BFI-O 
US. M=27.17; M=14.17; M=2.67; M=3.74; M=3.97; M=2.49; M=38.60; 
s=7.970 8=6.636 s=0.960 s=0.749 s=0.746 8=0.982 s=0.697 
India M=40.20; M=15.93; M=3.56; M=4.06; M=4.05; M=2.36; M=38.67; 
s=6.938 s=6.877 s=0.672 s=0.606 s=0.719 s=0.779 8=0.449 
t statistic = 12.533**+ -= 7.706**  3.318**+ eS BS a= 
** Significant at the .01 level (2-tailed) * Significant at the .05 level (2-tailed) 
-- Not Significant + Equal variances assumed 


Table 5: Differences between the U.S. and India - Trait Affect and Personality Dimensions 


4.2 Information Sources Used and their Relationship with Trait Affect and Personality Dimensions 


In this section, we consider whether there is a relationship between the extent to which certain information 
sources are used and both trait affect and personality types. While there is no specific pattern that emerges 
here, trait positive affect and the personality type openness are correlated with several of the different 
information sources. Additionally, the personality type extraversion is correlated with search engine news 
for both U.S. and India participants. The only other combination this is true for is trait positive affect and 
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online news services. This suggests that trait affect and personality types may be related to the information 


sources individuals choose to use, but in largely different ways for the two groups of participants. 


Rate how much you have used each of the following sources of information 


to learn about Edward J. 


Snowden, his disclosure of U.S. surveillance activities, and his legal situation: 


Country TPA TNA BFI-E BFLA 


U.S. -- oer a na 
Blogs 
ee India .305* z as z 
Online social media U.S. = PN a a: 
discussions India 328** -- E = 
U.S. .221* =- .217* -- 
Search Engine News : 
India -- RE 277* = 
S. 386** = -.279** ae = 
Online News Services i 3 286 ce 
India .253* = ee = 
Television shows USS. -- = per = 
(including online TV , 
; India = = = = 
sites) 
Personal discussions U.S. = = 2 = 
and email exchanges India .306* -- = -- 


Newspapers (including U.S. -- -- ie = 


online versions) India -- -- -- -- 


BFI-C BFI-N BFI-O 


** Significant at the .01 level (2-tailed) * Significant at the .05 level (2-tailed) -- Not Significant 


Table 6: Information Sources Used and their Relationship with Trait Affect and Personality Dimensions 


4.3. Risk Perceptions and their Relationship with Information Sources Used, Trait Affect, and 


Personality Dimensions 


Next, we examine several different possible relationships with the questions related to risk perceptions and 


other perceptions about the Snowden situation. First, we will explore possible relationships between risk 


perceptions and both trait affect and personality types. In this analysis, we find that the belief Snowden’s 


actions have damaged U.S. national security is correlated with trait positive af 


fect for U.S. participants and 


the personality traits conscientiousness and openness for India participants, albeit in a negative direction. 


Additionally, participants in India with higher levels of trait positive affect and the personality type 


conscientiousness are less likely to believe that Snowden’s actions negatively affects all democratic societies. 


Finally, participants in the U.S. with higher levels of the personality type agreeableness are less likely to 


believe that Snowden’s actions will make little difference in our security as a society, while the opposite is 


true for those with higher levels of the personality type neuroticism. 


In my view, Mr. Snowden’s actions... 


Country TPA TNA BFI-E BFI-A 


..make me feel U.S. -- -- ae a 
personally more secure. India = -- fa ne 
„have damaged U.S. U.S. .323** -- -- ae 
national security India -- a = ES 

U.S. = -- a —- 


BFI-C BFLN BFI-O 
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«have damaged all 


democratic nations’ India -- -- -- E -- - ae 
security 

..make me less US. a z a oe = a ae 
confident in my 


government’s oversight India 
of our nation’s security. 

„in the long run will US. z a Pa a e = has 
make for a stronger and 

f India -- -- -- -- -- -- -- 
more secure U.S. society 
„negatively affects all U.S. = = = m = a 


democratic societies, 


: 7 * = = ES F * = = 
U.S. and others India -248 .255 
will make little US. z z -- -.247* = .303** -- 
difference in our 

India z = re as E os _ 


security as a society 


ignificant at the . evel (2-taile ignificant at the . evel (2-taile -- Not Significant 
** Signifi he .01 level (2-tailed) * S he .05 level (2-tailed Not § 


Table 7: Risk Perceptions and their Relationship with Trait Affect and Personality Dimensions 


In our exploration of a few more general perceptions related to the Snowden situation, we find that 
participants in the U.S. with higher levels of trait positive affect are less likely to believe that Snowden is 
a courageous individual who followed his conscience. Interestingly, the personality type openness was related 
to all three questions for India participants. In particular, participants from India with higher levels of the 
personality type openness are more likely to view Snowden in a favorable light. 


In my view, Mr. Snowden... 


Country TPA TNA BFI-E BFLA BFI-C BFLN BFI-O 


AS a courageous US. -.225* -- -- -- -- -- -- 
individual who followed i 

. . India -- -- -- -- -- -- .390** 
his conscience. 

..broke the laws of the U.S. -- -- -- -- -- -- -- 
U.S. and thus deserves : 

Puea India -- -- -- -- -- -- -.248* 

to be tried in court. 

is a publicity seeker U.S. -- -- E -- = -- -- 
and hopes for personal 


Indi -- -- -- -- -- -- -.252* 
gain from his actions. Hare l 


** Significant at the .01 level (2-tailed) * Significant at the .05 level (2-tailed) -- Not Significant 
Table 8: Perceptions and their Relationship with Trait Affect and Personality Dimensions 


As this study is exploratory in nature, we are not in a position to declare that use of specific information 
sources leads one to have certain perceptions related to a news item. Nonetheless, the findings here are 
interesting as there are several correlations throughout. For example, participants from the U.S. that have 
followed the Snowden situation in part through TV shows are less likely to believe that Snowden’s actions 
make them feel personally more secure. In contrast, the opposite is true for those that have followed online 


454 


iConference 2014 Marc J. Dupuis et al. 


news services a fair amount. The relationship between TV shows and risk perceptions is found in two other 
questions. In each case, those that have learned more about the Snowden situation through TV shows are 
more likely to believe his actions were not good for them personally or for the U.S. as a whole. 

The opposite is true for U.S. participants that have learned about the Snowden situation in part 
through online social media discussions. Specifically, these individuals are more likely to think Snowden’s 
actions were good for society. The implications of these findings here are unclear, but they do raise some 
interesting questions for future research. For example, does the information source used matter? Or, is it 
that individuals who choose to use a particular information source are the same ones that are risk averse 
(or risk seeking)? 


In my view, Mr. Snowden’s actions... 


Online i Personal 
i Search Online y . 
Cou Social . TV Discussions 
Blogs ; Engine News : Newspapers 
ntry Media i Shows & Email 
: . News Services 
Discussion Exchanges 
..make me feel U.S. -- -- -- 165* = -.151* -- -- 
personally more Indi s444 gyp  191* 182 .246** ż 
secure. a 
have damaged U.S. -- -- -- .180* -- -.166* -- 
U.S. national Indi 
security a 
have damaged U.S. -- -.149* -- -- -- -.205** -- 
all democratic Indi 
nations’ security a > E E E 7 E E 
..make me less US. = 7 D _ E T PA 
confident in my 
government’s Indi . 
oversight of our .200 -- -- -- -- -- -- 
nation’s security. 
in the long run US. B 159* gi n _162* P p 
will make for a 
stronger and more Indi 
secure U.S. 1345 ** .247** .215* -- -- -- -- 
a 
society 
„negativel 
oe US. -.160* — -.157* 5 -- 155+ 5 4 
affects all 
d , 
emocratic Indi 
societies, U.S. and a -- -- -- -- -- -- -- 
others 
.. will make little U.S. -- -- = -- -- = -- 
difference in our 
: Indi 
security as a es .196* = = ei a as 
a 


society 
** Significant at the .01 level (2-tailed) * Significant at the .05 level (2-tailed) -- Not Significant 


Table 9: Risk Perceptions and their Relationship with Information Sources Used 
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Some of the same general trends noted previously are also found for these three questions. Again, 
participants in the U.S. that have learned about the Snowden situation in part through TV shows are less 
likely to have a favorable opinion of his actions and the overall effect these actions have on security. There 


is some suggestion below that this may not be true for participants from India. 


In my view, Mr. Snowden... 


Online g h Oni Personal 
Cou Bl Social o a a TV Discussions N 
ntry ai Media eee ews Shows & Email a a 
‘ j News Services 
Discussion Exchanges 
iS a Courageous U.S. .160* .162* -- -- -.166* .240** -- 


individual who : 

. ; India -- -- = -- = = 2 
followed his conscience. 
..broke the laws of the U.S. -- -- at -- .218** ee a 
U.S. and thus deserves 
to be tried in court. 
«is a publicity seeker U.S. -- -- -- -- -- -.198* -- 


and hopes for personal 


India -- -- ae = sT ea z 


: i . India -- .185* = eo = 267%** _ 
gain from his actions. 


ignificant at the . evel (2-taile ignificant at the . evel (2-taile -- Not Significant 
** Signifi he .01 level (2-tailed) * Signif he .05 level (2-tailed Not Signifi 


Table 10: Perceptions and their Relationship with Information Sources Used 


4.4 Consistency of Information Sources Used and Perceptions Over Five Weeks 


Finally, we examine the question of whether a short time span of approximately five weeks results in changes 
in perceptions of Snowden or the information sources used to learn about the situation. The analysis 
indicates that there was no statistically significant difference in these items between survey one and survey 
two. It is possible that longer periods of time between the administration of surveys could change this and 
is something that may be worth exploring in future research. 


5 Conclusion 


This exploratory study demonstrates the statistically significant associations among non-cognitive factors 
with information acquisition and interpretation. This suggests that information research on perceptions 
and information behavior should take into account these non-cognitive factors in the research design and 
analysis of results. Ignoring these factors means that research findings may be less robust than findings 
from studies that incorporate these factors. 

Specifically, this exploratory study illustrates that how individuals perceive (security) wall breaches 
— in particular the actions taken by Edward Snowden in revealing US secrets — is associated with 
differences in the non-cognitive factors of trait affect and personality. The correlations among the values 
from our sample are statistically significant and measureable, but we cannot say how extensive these 
influences may be with other samples or samples from other populations. In our study spanning five weeks, 
we see a consistent set of responses identifying perceptions and sources of information. 

As with many exploratory studies, this one has raised more questions than it has answered. What 
the study has contributed is an awareness that non-cognitive aspects of perception and sense-making — 
particularly trait affect and personality dimensions — are important in understanding how people may 


acquire and interpret information. 
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5.1 Implications 


Individuals perceive personal security issues differently and react differently to the actions taken by 
individuals who may act to reveal secrets. Security — more precisely, the perception of security — is 
constructed differently by different individuals, and this construction depends on personality and trait affect 
in addition to the information sources available. The results of this study, however, have implications that 
go beyond the current issue of Snowden and his actions on security. They raise questions about how 
individuals’ trait affect and personality dimensions may be associated with information behaviors and sense- 


making. 


5.2 Limitations 


This exploratory study has numerous constraints that limit the generalizability of the responses. The 
samples are not representative samples of the populations of India and the U.S. The choice of Mechanical 
Turk means that there are inherent limitations (include response biases) that can limit the validity of the 
findings in other contexts. 


5.3 Future Work 


Despite its limitations, this study provides motivation for future studies that examine in more detail how 
personality dimensions and trait affect are associated with choices of information sources, sense-making, 
and judgments. The results suggest that models that seek to explain the variance among these dimensions 
could be fruitful. Data on cultural values — not examined in this paper — also may prove useful additions 
to such a variance model. 


6 References 


Amiel, T., & Sargent, S. L. (2004). Individual differences in Internet usage motives. Computers in Human 
Behavior, 20(6), 711-726. 

Benet-Martinez, V., & John, O. P. (1998). Los Cinco Grandes across cultures and ethnic groups: Multitrait- 
multimethod analyses of the Big Five in Spanish and English. Journal of personality and social 
psychology, 75(3), 729. 

Borkenau, P., & Mauer, N. (2006). Personality, emotionality, and risk prediction. Journal of Individual 
Differences, 27(3), 127-135. 

Bower, G. H. (1981). Mood and memory. American Psychologist, 36(2), 129-148. 

Carrigan, P. M. (1960). Extraversion-introversion as a dimension of personality: A reappraisal. 
Psychological Bulletin, 57(5), 329. 

Clore, G. L., Gasper, K., & Garvin, E. (2001). Affect as Information. In J. P. Forgas (Ed.), Handbook of 
Affect and Social Cognition (pp. 121-144). Mahwah, N.J.: L. Erlbaum Associates. 

Corcoran, D. W. J. (1964). The relation between introversion and salivation. The American Journal of 
Psychology, 77(2), 298-300. 

Costa, P. T., & McCrae, R. R. (1985). The NEO personality inventory: Manual, form S and form R. 
Psychological Assessment Resources. 

Curry, L. A., & Youngblade, L. M. (2006). Negative affect, risk perception, and adolescent risk behavior. 
Journal of Applied Developmental Psychology, 27(5), 468-485. doi:10.1016/j.appdev.2006.06.001 

DeSteno, D., Petty, R. E., Wegener, D. T., & Rucker, D. D. (2000). Beyond valence in the perception of 
likelihood: The role of emotion specificity. Journal of Personality and Social Psychology, 78(3), 397- 
416. 

Druckman, J., & McDermott, R. (2008). Emotion and the Framing of Risky Choice. Political Behavior, 
30(3), 297-321. 


457 


iConference 2014 Marc J. Dupuis et al. 


Ekman, P., & Davidson, R. J. (1994). The nature of emotion 
University Press. 

Eysenck, S. B., & Eysenck, H. J. (1967a). Salivary response to lemon juice as a measure of introversion. 
Perceptual and motor skills, 24(3c), 1047-1053. 

Eysenck, S. B., & Eysenck, H. J. (1967b). Physiological reactivity to sensory stimulation as a measure of 
personality. Psychological reports, 20(1), 45-46. 

Fedorikhin, A., & Cole, C. A. (2004). Mood effects on attitudes, perceived risk and choice: Moderators and 
mediators. Journal of Consumer Psychology, 14(1-2), 2-12. 

Finucane, M. L., Alhakami, A., Slovic, P., & Johnson, S. M. (2000). The affect heuristic in judgments of 
risks and benefits. Journal of Behavioral Decision Making, 13(1), 1-17. 

Forgas, J. (1995). Mood and judgment: the affect infusion model (AIM). Psychological bulletin, 117(1), 39- 
66. 

Forgas, J. (2008). Affect and Cognition. Perspectives on Psychological Science, 3(2), 94-101. 

Fowles, D. C., Roberts, R., & Nagel, K. E. (1977). The influence of introversion/extraversion on the skin 
conductance response to stress and stimulus intensity. Journal of Research in Personality, 11(2), 
129-146. 

Gale, A., Edwards, J., Morris, P., Moore, R., & Forrester, D. (2001). Extraversion—introversion, 
neuroticism-stability, and EEG indicators of positive and negative empathic mood. Personality and 
Individual Differences, 30(3), 449-461. 

George, J. M. (1989). Mood and absence. Journal of Applied Psychology, 74(2), 317-324. 

Gerber, A. S., Huber, G. A., Doherty, D., & Dowling, C. M. (2011). Personality traits and the consumption 
of political information. American Politics Research, 39(1), 32-84. 

Gidda, M. (2013, July 25). Edward Snowden and the NSA files—timeline. The Guardian (London). Retrieved 
from http://www.theguardian.com/world/2013/jun/23/edward-snowden-nsa-files-timeline 
Gilliland, K. (1980). The interactive effect of introversion-extraversion with caffeine induced arousal on 

verbal performance. Journal of Research in Personality, 14(4), 482-492. 

Gray, J. A. (1970). The psychophysiological basis of introversion-extraversion. Behaviour research and 
therapy, 8(3), 249-266. 

Grindley, E. J., Zizzi, S. J., & Nasypany, A. M. (2008). Use of Protection Motivation Theory, Affect, and 
Barriers to Understand and Predict Adherence to Outpatient Rehabilitation. Physical Therapy, 
88(12). 

Gros, D. F., Antony, M. M., Simms, L. J., & McCabe, R. E. (2007). Psychometric properties of the State- 
Trait Inventory for Cognitive and Somatic Anxiety (STICSA): Comparison to the State-Trait 
Anxiety Inventory (STAI). Psychological Assessment, 19(4), 369-381. 

Hamburger, Y. A., & Ben-Artzi, E. (2000). The relationship between extraversion and neuroticism and the 
different uses of the Internet. Computers in Human Behavior, 16(4), 441-449. 

Heaton, A. W., & Kruglanski, A. W. (1991). Person perception by introverts and extraverts under time 
pressure: Effects of need for closure. Personality and Social Psychology Bulletin, 17(2), 161-165. 

Heinstrém, J. (2003). Five personality dimensions and their influence on information behaviour. Information 
Research, 9(1), 9-1. 

Heinstrém, J. (2005). Fast surfing, broad scanning and deep diving: The influence of personality and study 
approach on students’ information-seeking behavior. Journal of documentation, 61(2), 228-247. 

Helweg-Larsen, M., & Shepperd, J. A. (2001). Do Moderators of the Optimistic Bias Affect Personal or 
Target Risk Estimates? A Review of the Literature. Personality and Social Psychology Review, 
5(1), 74-95. doi:10.1207/S15327957PSPRO501_5 

Hildebrand, H. P. (1958). A Factorial Study of Introversion-Extraversion. British Journal of Psychology, 
49(1), 1-11. 


458 


: fundamental | 


iConference 2014 Marc J. Dupuis et al. 


Hills, P., & Argyle, M. (2003). Uses of the Internet and their relationships with individual differences in 
personality. Computers in Human Behavior, 19(1), 59-70. 

Hofstede, G. (1984). Culture’s consequences: International differences in work-related values (Vol. 5). sage. 

Hofstede, G. (2008). Values Survey Module 2008. Retrieved from http://www.geerthofstede.com/vsm-08 

Howe, J. (2006). The Rise of Crowdsourcing. Wired, 14(6). Retrieved from 
http://www.wired.com/wired/archive/14.06/crowds.html 

Ipeirotis, P. G., Provost, F., & Wang, J. (2010). Quality management on Amazon Mechanical Turk. In 
Proceedings of the ACM SIGKDD Workshop on Human Computation (pp. 64-67). Washington 
DC: ACM. 

Isen, A. M. (1984). Toward understanding the role of affect in cognition. In R. S. Wyer & T. K. Srull (Eds.), 
Handbook of social cognition (pp. 179-236). Hillsdale, N.J.: L. Erlbaum Associates. 

Isen, A. M., & Geva, N. (1987). The Influence of Positive Affect on Acceptable Level of Risk: The Person 
with a Large Canoe Has a Large Worry. Organizational Behavior & Human Decision Processes, 
39(2). 

Isen, A. M., Nygren, T. E., & Ashby, F. G. (1988). Influence of positive affect on the subjective utility of 
gains and losses: It is just not worth the risk. Journal of Personality and Social Psychology, 55(5), 
710-717. 

Isen, A. M., & Patrick, R. (1983). The Effect of Positive Feelings on Risk Taking: When the Chips Are 
Down. Organizational Behavior & Human Performance, 31(2). 

Isen, A. M., & Simmonds, S. F. (1978). The Effect of Feeling Good on a Helping Task that is Incompatible 
with Good Mood. Social Psychology, 41(4), 346-349. 

Jagiellowicz, J., Xu, X., Aron, A., Aron, E., Cao, G., Feng, T., & Weng, X. (2011). The trait of sensory 
processing sensitivity and neural responses to changes in visual scenes. Social Cognitive and 
Affective Neuroscience, 6(1), 38-47. 

John, O. P., Donahue, E. M., & Kentle, R. L. (1991). The big five inventory—versions 4a and 54. Berkeley: 
University of California, Berkeley, Institute of Personality and Social Research. 

John, O. P., Naumann, L. P., & Soto, C. J. (2008). Paradigm shift to the integrative big five trait taxonomy. 
Handbook of personality: Theory and research, 3, 114-158. 

Johnson, D. L., Wiebe, J. S., Gold, S. M., Andreasen, N. C., Hichwa, R. D., Watkins, G. L., & Ponto, L. 
L. B. (1999). Cerebral blood flow and personality: a positron emission tomography study. American 
Journal of Psychiatry, 156(2), 252-257. 

Johnson, E. J., & Tversky, A. (1983). Affect, generalization, and the perception of risk. Journal of 
Personality and Social Psychology, 45(1), 20-31. doi:10.1037 /0022-3514.45.1.20 

Jung, C. G. (1923). Psychological types: or the psychology of individuation. 

Kahneman, D., Slovic, P., & Tversky, A. (1982). Judgment under Uncertainty: Heuristics and Biases. 
Cambridge University Press. 

Kittur, A., Chi, E. H., & Suh, B. (2008). Crowdsourcing user studies with Mechanical Turk. In Proceedings 
of the twenty-sixth annual SIGCHI conference on Human factors in computing systems (pp. 453- 
456). Florence, Italy: ACM. 

Kotulic, A. G., & Clark, J. G. (2004). Why there arent more information security research studies. 
Information & Management, 41(5). 

Kraut, R., Kiesler, S., Boneva, B., Cummings, J., Helgeson, V., & Crawford, A. (2002). Internet paradox 
revisited. Journal of social issues, 58(1), 49-74. 

Lerner, J. S., & Keltner, D. (2001). Fear, anger, and risk. Journal of Personality and Social Psychology, 
81(1), 146-159. doi:10.1037/0022-3514.81.1.146 

Lu, J., Xie, X., & Zhang, R. (2013). Focusing on appraisals: How and why anger and fear influence driving 
risk perception. Journal of Safety Research, 45(0), 65-73. doi:10.1016/j.jsr.2013.01.009 


459 


iConference 2014 Marc J. Dupuis et al. 


Mahmoud, M. M., Baltrusaitis, T., & Robinson, P. (2012). Crowdsouring in emotion studies across time 
and culture. In Proceedings of the ACM multimedia 2012 workshop on Crowdsourcing for 
multimedia (pp. 15-16). Nara, Japan: ACM. 

Miller, V. D., & Jablin, F. M. (1991). Information seeking during organizational entry: Influences, tactics, 
and a model of the process. Academy of Management Review, 16(1), 92-120. 

Ntoumanis, N., & Biddle, S. J. H. (1998). The relationship of coping and its perceived effectiveness to 
positive and negative affect in sport. Personality and Individual Differences, 24(6), 773-788. 
doi:10.1016/S0191-8869(97)00240-7 

Paolacci, G., Chandler, J., & Ipeirotis, P. (2010). Running experiments on amazon mechanical turk. 
Judgment and Decision Making, 5(5), 411-419. 

Rhee, H.-S., Ryu, Y., & Kim, C.-T. (2005). I Am Fine but You Are Not: Optimistic Bias and Illusion of 
Control on Information Security. In ICIS 2005 Proceedings. 

Schwarz, N, & Clore, G. L. (1996). Feelings and Phenomenal experiences. In E. T. Higgins & A. W. 
Kruglanski (Eds.), Social psychology =6hanMovokY ofkbasic principles 
Guilford Press. 

Schwarz, Norbert, & Clore, G. L. (2003). Mood as Information: 20 Years Later. Psychological Inquiry, 
14(3), 296-303. 

Shih T.-H., & Xitao F. (2008). Comparing response rates from web and mail surveys: A meta-analysis. Field 
Methods Field Methods, 20(3), 249-271. 

Slovic, P., Finucane, M. L., Peters, E., & MacGregor, D. G. (2007). The affect heuristic. European Journal 
of Operational Research, 177(3), 1333-1352. 

Smith, C. A., & Kirby, L. D. (2001). Affect and Cognitive Appraisal Processes. In J. P. Forgas (Ed.), 
Handbook of Affect and Social Cognition (pp. 75-92). Mahwah, N.J.: L. Erlbaum Associates. 

Tidwell, M., & Sias, P. (2005). Personality and Information Seeking Understanding How Traits Influence 
Information-Seeking Behaviors. Journal of Business Communication, 42(1), 51-77. 

Treasure, D. C., Monson, J., & Lox, C. (1996). Relationship Between Self-Efficacy, Wrestling Performance, 
and Affect Prior to Competition. Sport Psychologist, 10(1). 

Tupes, E. C., & Christal, R. E. (1961). Stability of personality factors based on trait ratings. USAF ASD 
Tech. Rep. 

Vasey, M. W., Harbaugh, C. N., Mikolich, M., Firestone, A., & Bijttebier, P. (2013). Positive affectivity 
and attentional control moderate the link between negative affectivity and depressed mood. 
Personality and Individual Differences, 54(7), 802-807. doi:10.1016/j-paid.2012.12.012 

Waters, E. A. (2008). Feeling good, feeling bad, and feeling at-risk: a review of incidental affect’s influence 
on likelihood estimates of health hazards and life events. Journal of Risk Research, 11(5), 569-595. 
doi:10.1080/13669870701715576 

Watson, D., & Clark, L. A. (1994). The PANAS-X: Manual for the Positive and Negative Affect Schedule 
- Expanded Form. University of Iowa. Retrieved from http://ir-uiowa.edu/psychology_pubs/11 

Watson, D., & Clark, L. A. (1997). Measurement and Mismeasurement of Mood: Recurrent and Emergent 
Issues. Journal of Personality Assessment, 68(2), 267. 

Watson, D., Clark, L. A., McIntyre, C. W., & Hamaker, S. (1992). Affect, personality, and social activity. 
Journal of Personality and Social Psychology, 63(6), 1011-1025. 

Watson, D., Clark, L. A., & Tellegen, A. (1988). Development and Validation of Brief Measures of Positive 
and Negative Affect: The PANAS Scales. Journal of Personality and Social Psychology, 54(6), 
1063-1070. doi:doi: 

Watson, D., & Tellegen, A. (1985). Toward a consensual structure of mood. Psychological bulletin, 98(2), 
219-35. 


460 


iConference 2014 Marc J. Dupuis et al. 


Watson, D., & Walker, L. (1996). The long-term stability and predictive validity of trait measures of affect. 
Journal of personality and social psychology, 70(3), 567-77. 

7 Table of Tables 

Table 1: Age, Gender, and Educational Attainment Levels .0............ccccccccececeeeeeeceececeeeceseeseaeceeeeeeeeeennnaeees 450 
Table 2: Differences between the U.S. and India - Awareness and Perceptions ...........:.:ccecceeeeceeeeeeeteeeeee 451 
Table 3: Differences between the U.S. and India - Risk Perceptions ................ceceeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeees 451 
Table 4: Differences between the U.S. and India - Information Sources Used..............cceceececececececeeeeeeeees 452 
Table 5: Differences between the U.S. and India - Trait Affect and Personality Dimensions.................. 452 
Table 6: Information Sources Used and their Relationship with Trait Affect and Personality Dimensions 
naea Rat te Get asl, cect tet tire tc labia tet ha seine et te aie ech wa) tiled ite Chatter, Cech chat ee, cha 453 
Table 7: Risk Perceptions and their Relationship with Trait Affect and Personality Dimensions............ 454 
Table 8: Perceptions and their Relationship with Trait Affect and Personality Dimensions .................... 454 
Table 9: Risk Perceptions and their Relationship with Information Sources Used...............c:c:seseeeeeeeeeeeees 455 
Table 10: Perceptions and their Relationship with Information Sources Used ...............ceeceeecececeeececeeeeeees 456 


461 


Computational Assessment of the Impact of Social Justice Documentaries 


Jana Diesner!, Susie Pak’, Jinseok Kim!, Kiumars Soltani? and Amirhossein Aleyasen* 
1 The iSchool, University of Illinois Urbana Champaign 

? Department of History, St John’s University 

3 Ilinois Informatics Institute, University of Illinois Urbana Champaign 

t Department of Computer Science, University of Illinois Urbana Champaign 


Abstract 

Documentaries are meant to tell a story, that is, to create memory, imagination and sharing (Rose, 2012). 
Moreover, documentaries aim to lead to change in people’s knowledge and/ or behavior (Barrett & Leddy, 
2008). How can we know if a documentary has achieved these goals? We report on a research project 
where we have been developing, applying and evaluating a theoretically-grounded, empirical and 
computational solution for assessing the impact of social justice documentaries in a scalable, robust and 
rigorous fashion. We leverage cutting-edge methods from socio-technical data analytics — namely natural 
language processing and network analysis - for this purpose and provide a publicly available technology 
(ConText) that supports these routines. In this paper, we focus on the theoretical foundations of this 
project, address our methodological and technical framework, and provide an illustrative example of the 
introduced solution. 
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1 Introduction 


The need for the rigorous and scientific evaluation of the impact of social justice documentaries has been 
repeatedly pointed out by funding agencies, practitioners and researchers who are active in the field of 
documentaries in particular and media in general (Barrett & Leddy, 2008; Clark & Abrash, 2011; 
KnightFoundation, 2011). In these domains, impact assessment has high practical relevance: when a funding 
agency, e.g. the Sundance Documentary Fund, the JustFilms Division at the Ford Foundation or BritDoc, 
award a grant to a film maker, they want reliable and comprehensive information on the return of their 
investment, where the goal with these investments is to cause change in society. However, as explained in 
the background section, the amount and depth of prior reports and actual work on this topic is limited. In 
a nutshell, assessment in this domain has been typically done by using (a) traditional, scalable and 
quantitative metrics, such as the number of visitors of a screening or webpage, and/ or (b) conventional, 
qualitative methods for studying the perception of a topic or media product by few people in depth, such 
as interviews with focus groups. Overall, the quantitative metrics are typically used on the community or 
societal level (macro-level), while the qualitative methods are applied on the individual or small-group level 
(micro-level). We argue that these two layers have to be integrated to gain a comprehensive understanding 
of the impact of films. 

Another major shortcoming with prior impact assessment work in this field is that while evaluation 


methods do consider the reaction of target audiences, they fail to take into account (a) relational information 
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about audience members and other stakeholders as well as (b) the information produced or shared by these 
groups. We have been addressing these limitation by developing a methodology and tool that help to map, 
monitor and analyze (a) the social network of stakeholders involved with the main topic of a movie — 
regardless of whether they have anything to do with a particular production or not, and (b) the content of 
the information produced and shared by these agents. We bring these types of behavioral information (social 
relationships and content) together by constructing and analyzing socio-semantic networks of social agents 
(stakeholders, audiences) and information. We argue that this approach provides a more comprehensive 
window into the structure, functioning and dynamics of the interplay of social agents and information than 
prior approaches used in this domain do (Diesner, 2012, 2013; Gloor & Zhao, 2006; Roth & Cointet, 2010). 

This paper is structured as follows: Section two reviews prior work on documentary assessment and 
concludes with identifying missing pieces. Section three addresses these shortcomings by reporting on the 
development of a theoretically grounded, computational solution for mapping and assessing impact. We put 
the proposed solution into an application context by providing an illustrative example. Section four 
summarizes the results of this work, open questions and next steps. 


2 Background 


In this section, we synthesize prior work on assessing the impact of documentaries. Basically, there are three 
families of prior studies: case studies of individual movies, proposed frameworks, and academic research. 


2.1 Individual Case Studies 


One main approach to measuring the impact of documentaries are cases studies, i.e. collections of 
quantitative metrics and/or anecdotal reports on a single production. Two examples are the assessment of 
“Legacy” (Applied_Research_ Consulting LLC, 2002), and the Working Films’ evaluation of “Blue Vinyl” 
(Barrett & Leddy, 2008). Such evaluations approximate the influence of a documentary by considering (a 
combination of) the following indicators: 


- Cumulative counts of the number of screenings, video distributions, or people reached through 
campaign activities. 

- Comments from individual viewers; analyzed qualitatively on a case by case basis. 

- Lists of key organizations participating in the documentary-related campaign. Connections between 
these organizations are typically not considered. 

- A few instances of policy adoption. 


Overall, case studies can be useful in highlighting the outcomes of a specific documentary. However they 
do not generalize to other productions. In other words, this approach fails to ensure that the same 
methodology is applicable across productions and genres such that findings for multiple films could be 
compared. 


2.2 Previously Proposed Frameworks 


Various major media institutes and foundations, including the Center for Social Media, the Fledgling Fund, 
the Knight Foundation and the Rockefeller Foundation, have proposed systematic frameworks for impact 
assessment (Barrett & Leddy, 2008; Clark & Abrash, 2011; Figueroa, 2002; KnightFoundation, 2011). Each 
of these organizations has released their own framework, which typically measures impact along five to 
seven dimensions that entail the following: the aforementioned quantitative metrics plus influence on the 
individual, community, and societal level. 

The main limitation with solutions from this category is that these frameworks are of normative 
and theoretical nature such that testing them in real-world settings might require adaptations and changes 
in order to obtain accurate and actionable results. Furthermore, the indicators recommended in prior 
frameworks are highly similar to the anecdotal evidence mentioned in the case studies section. In terms of 
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methodology, these frameworks typically combine simple cumulative frequency counts (number of 
screenings, viewers, website visitors and supportive organizations) with analyses of small samples of 
narrative descriptions from participants’ self-reports. 

Some framework proposals actually include indicators related to social networks: for example, 
“interorganizational collaboration” (Fledgling Fund), “network building” (Center for Social Media) and 
“network cohesion” (Rockefeller Foundation) are mentioned as key ingredients. However there are no further 
details on how to collect, analyze, interpret and leverage network data. Even where core network metrics, 
such as density and centrality, are mentioned (Rockefeller Foundation), these terms are simply introduced 
as possible metrics without providing information or practical guidance for how to use these metrics in an 


evaluation process. 


2.3 Academic Research 


The majority of scholarly work on this topic is confined to studying psychological effects of documentaries 
on individual viewers. Thus, most scholarly publications consider documentaries as a subcategory of mass 
media. A few exceptions exist: Whiteman (2004) uses a political science perspective to study several factors 
that affect a documentary’s impact. However, since his framework heavily depends on qualitative analysis 
such as observations and content analysis, it is highly similar to the first two groups of approaches. 
Summarizing the reviewed families of assessment approaches, we conclude that although various 
types of approaches have been suggested and applied, most of them are similar in that they jointly consider 
traditional frequency counts on a large scale and qualitative indicators on a small scale. Several proposals 
have emphasized the importance of taking social networks and the content of information associated with 
network members into consideration. At least in the domain of assessing the impact of documentaries, these 
strategies are waiting to be put into action. The work presented herein is a step into this direction. 


3 Method 


The overall process for this research project is shown in Table 1, and further explained in this section. 


Step Description Result 
Comprehensive review of prior CoMTI: Framework of relevant 
1. Theory literature on impact assessment of | dimensions/indicators of media 
documentaries impact: (shown in Table 2) 


. oo, Translate relevant indicators into 
2. Operationalization k ee 
metrics and indices 


er ; Combination of social network 
Map indices to methods, metrics si d text mini lied t 
analysis and text mining, applied to 
3. Methods, metrics and and algorithms suitable for y z f Sree 
data from social media, news 


algorithms analyzing large-scale, empirical i . 

coverage, interviews, and ground 
data ; 

truth about documentaries 
Comprehensive review of existing ConText -publicly available tool for 
technology to decide whether to extracting network data from text 

4. Technology DRA i o i 

reuse an existing tool or build a data, and jointly analyzing text 
new one (shown in Appendix) data and network data 


Empirical: news coverage, social 
5. Data Collection a 8 
media data, focus groups data 


Use ConText and additional tools 


6. Analysis and Apply ConText to data on various ; f 
for evaluation of various 


Interpretation documentaries i 
documentaries 
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. Assess accuracy, performance and In cooperation with film makers, 
7. Evaluation = : 
usability of methodology and tool work in progress 


Table 1: Research and Development Process 


Based on our literature review (step one in Table above and background section), we argue that measuring 
the impact of social justice documentaries requires the capturing, modeling and analysis of the map of the 
stakeholders and themes associated with (the theme of) a movie in a systemic, scalable and analytically 
rigorous fashion. Specifically, in order to understand the functioning and dynamics of the wider context 
surrounding a media production and its impact, we need to move beyond the level of individual and small- 
group studies by also identifying the connections between people, groups and information. Furthermore, we 
need to consider the content of the information associated with some campaign and discourse. These 
requirements have also been suggested by media production organizations, but have not been put to test as 
explained in the previous section. 


3.1 Theoretical Framework 


We have synthesized the indicators of impact as suggested by prior work into a framework that we named 
CoMTI (content, medium, target, and impact). This model is organized along the main dimensions of 
impact assessment and respective methods as explained below: 


- Dimension: a component or process through which a documentary can achieve impact. 
- Level: a set of sub-categories of evaluation criteria per dimension. 

- Index: a set of evaluation factors per level. 

- Analytics: suitable methods for discovering meaningful results per index category. 

- Item: a set of specific features to be measured per index. 


The framework is grounded in a set of theories and allows for large-scale, multi-level analysis: 


- Theoretical foundation: framework based on empirically and rigorously tested theories from domains 
including diffusion of innovation and information, media effects, marketing, social and semantic 
networks, and collective action. 

- Domain expertise: framework incorporates concepts specific to documentary evaluation that were 
suggested by experts from this domain. 

- Analytical Comprehensiveness: considered analytical methods and metrics originating from 
statistics, network analysis and text analysis. 

- Multi-modal units of analysis: considering the entity types people, organization and information. 

- Integrated approach: combines traditional strategies for measuring documentary impact (frequency 
counts and qualitative analysis) with additional methods (network analysis, text analysis). 


This framework entails a variety of stimuli that have been associated with cognitive, attitudinal and 
behavioral changes over time on the individual, communal, societal and global level. In this context, we 
consider a documentary as a special kind of media products. When it comes to identifying the impact of 
media content on people, prior work can be divided into three categories (Laughey, 2007): 


- Direct impact: media content can have powerful influence on the knowledge and behavior of the 
audience. 

- Indirect impact: media content is one of several factors that affect peoples’ behavior and cognition. 

- Null impact: media content does not have any significant influence on peoples’ cognitive and 
behavior. 


Little research has conclusively confirmed or negated media impact (Sparks, 2012). Even with advanced 
research designs, evidence for a causal relationship between media and impact remains vague. Several lab 
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experiments have successfully shown short-term impacts. However, the highly controlled lab study settings 
are a limitation to the generalization of any findings to real-world situations. More importantly, the small- 
scale and typically point-wise nature of such work prevents longitudinal insights. Despite many open 
questions about media impacts, scholars agree that media content affects our perception and behavior in 
certain, maybe latent, ways (Bryant & Oliver, 2008; Laughey, 2007). The proposed framework assumes that 
the impact of a documentary can be measured; and that this impact can be direct, indirect or not evoked. 
Also, we conceptualize the entire process of making and distributing a documentary as a communication 
process, where participants exchange information and knowledge via behavioral signals, including natural 
language (Griffin & McClish, 2003). 

A large common denominator of media effects research is the belief that humans can be affected by 
media stimuli. The holistic process of how stimuli influence people has been dissected into five categories; 
all of which were originally suggested by Laswell in his model of communication (Johnson & Klare, 1961; 
Lasswell, 1948). Most theories of media effects fit into one or more of these categories (Laughey, 2007). We 
use the Laswell model as a backbone for the CoMTI framework by empirically identifying: What has been 
said (content) on which channel (medium) to whom (target) and with what effects (impact)? The Who 
dimension is partially entailed in the medium dimension, and will also be considered when we extract 
(groups of) stakeholders from network data, and by bringing text mining methods to the medium dimension. 
In the Lasswell formula, communication happens in order to influence a target audience. Thus, 
communication is conceptualized as a persuasive process (McQuail, 2010). This aligns with the goal of 
documentaries to lead to change in people’s knowledge and/ or behavior. 

Applying the provided definition of media use, we argue that a documentary is not some one-way 
communication where some agent (seeks to) transfer ideas or messages to others in order to achieve certain 
effects, but rather a two-way process in which senders and receivers interact with each other: receivers’ 
responses and reactions to senders’ input form dynamic feedback loops. This inherently reciprocal and 
iterative process is represented in our framework as shown in Figure 1, and is essential to overcome 
Lasswell’s conceptualization which has been criticized for it’s a linear, one-way direction of communication 
flow. Such feedback loops have high practical implications as film producers and engagement workers can 
leverage them to model the landscape of stakeholders and discourse associated with the theme of a 
documentary prior to and during release in order to identify relevant social agents and themes to link up 
to. This helps to strategically allocate scarce resources. 


CONTENT 
eoum | 
TARGET 


Figure 1: CoMTI framework with a Feedback Loops 


The CoMTI framework borrows elements from verified outcomes of media studies, but is also unique in the 
following three ways: 
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- While most studies of media effects focus on one or two phases of the Lasswell’s formula, our 
framework models the whole communication processes around a documentary. 

- The proposed framework overcomes the linear, sender-driven, one-way flow of communication. 

- The proposed framework is tailored towards measuring the impact of documentaries by integrating 
dependent variables into measurable indices. 


In the next section, we briefly elaborate on every dimension of the CoMTI framework. 
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COMT! MODEL 


A Comprehensive Framework for Measuring the Impact of Documentaries 


DIMENSION 


LEVEL 


INDEX 


ANALYTICS 


ITEM 


MESSAGE 


EXPECTED OUTCOME 


Description 


CO N T E N T Guiding Factor Ranking Report by producers or funding agencies 
EVALUATION PRIORITY weighing 
RESOURCE 
RELEASE OFFLINE Number of movies, CDs distributed 
MEDIUM Outreach Stats Number of theatrical, Internet release 
ONLINE Duration of release; Sales of product 
: p Frequency of news coverage weighted 
= MASS MEDIA Mass media ee by influence (article, opinion/editorial) 
=) y Domestic, international broadcast 
M E DI U M Q User Media Twitter, Facebook, Blogs, webpages 
> USER MEDIA h Frequency of talking about, links 
Attention i 
lu included, user-created contents 
= Text Minin 
— 8 Number of festival acceptance 
N A 
am sat ii Prestige Web Analytics Number of awards 
Q Survey, Interview Number of professional reviews 
n INTERPERSONAL Intimate Conversation, talking on the phone or 
c INTERACTION Attention email, lectures, exchange of letters, etc. 
AUDIENCE SIZE Reachability Text Mining Number of viewers or visitors 
Web Analytics 
Archived Data : i 
HOMOGENEITY Diversity Geography & demography: location, 


TARGET 


Survey, Interview 


age, gender, education, income 


AUDIENCE TYPE 


SINKER 


Passiveness 


Text Mining 
Web Analytics 


Number of inactive viewers 


TRANSMITTER Leadership Network Analysis Number of opinion leaders 
Text Mining l 
Number of advocacy communities 
COLLECTIVE ENTITY Advocacy Web Analytics y piaz 
3 colleges, schools, or NGOs 
Survey, Interview 
Stats, Text Mining Frequency of names, ideas, thoughts, or 
COGNITIVE Awareness Web Analytics, concepts appeared in corpus 
Network Analysis Report of increased awareness 
Frequency of positive, negative, neutral 
Sectinment sentiments of comments 
ATTITUDINAL Sentiment ‘Analvie Personal, critics, mass media, and 
y organizational responses 
Reaction to calls for action 
=| = How well connected 
<q qj a J How much & far disseminated 
2 Z| < <í How centralized is the impact 
O 5 - Engagement r i 
| M PACT w| a enanat The route of diffusion 
S = Number of action pledges 
2 2 Q Q Connerctedness Text Mining alliance a allied Bene Se netee 
Q = (e) (9| BEHAVIORAL Capacity Web Analytics à A me gariz, 
A i Discussion or decision by organizational, 
a e) ”n Expansiveness Network Analysis i ` 
5S0 Gentrslication governmental, international 
policy/legislation makers 
sponsorship of bills, adoption, donation, 
funding, implementation, social 
movement or intervention 
Comparison b/w multiple time points 
Duration of impact 
Impact Longitudinal Increase vs. decrease 
TEMPORAL Dynamics analysis Change vs. stability vs. reinforcement 


Table 2: CoMTI Framework for Impact Assessment 
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3.1.1 Content 


Studies of media impact start from the presence or absence of certain kinds of content before measuring 
impact (Sparks, 2012). Taking the explicit and implicit content of a film and the communication related to 
(the theme of) the movie into account is essential for impact assessment and related strategic communication 
and interventions. The Content dimension of the CoMTI framework consists of the following levels of 


measurement: 


- Message: the main message that a film wants to convey. This can be elicited from filmmakers or in 
a more empirical fashion from the film transcripts. 

- Expected Outcome: goals set by film makers for the scope of reach and intended changes. 

- Evaluation Priority: a ranked list of priorities with respect to intended outcomes, which can be 
elicited from producers. These rankings can be used to weight impact categories. 

- Resource: investment needed for a production, e.g. money, personnel, engagement work and follow- 
up activities. This information can be used to assess the effectiveness of a production — how much 


input is needed to move the needle how much? 


The outlined levels of content are not limited to documentaries, but also applicable to other types of media 
data, and are related to each other throughout the data collection and evaluation process. 


3.1.2 Medium 


Some scholars argue that the medium or channel, which nowadays are often information and communication 
technologies, determine the characteristics of media products, content, and their political, economic, social 
and cultural usage (Innis, 2007; McLuhan, 1994). Acknowledging the importance of the medium, previous 
assessments of documentary impact typically report media statistics, such as the frequency of screenings, 
theatrical release and broadcasts; considering higher numbers as (proxies for) greater impact (Barrett & 
Leddy, 2008; Clark & Abrash, 2011; John & James, 2011). One limitation with this strategy is that exposure 
does not have uniform impact cross recipients. Prior studies on the diffusion of innovation have shown that 
different types of adopters perceive information at different points in the life cycle of a production and with 
varying degrees of depth of impact (Rogers, 2003). Moreover, social networking effects, e.g. word of mouth, 
strongly impact this process (E. Katz & Lazarsfeld, 2006; M. L. Katz & Shapiro, 1986). Thus, the choice of 
media for a documentary is likely to shape the breadth and depth of potential impact on the public. 

Another problem is that prior studies do not differentiate between first-hand (seeing the actual 
film) versus secondary ((social) media reactions, public discourse) media exposure. We argue that this 
distinction matters because a) first-hand exposure is easier to track for distributors and b) secondary 
exposure has the potential for greater networking effects. This separation goes hand in hand with the 
distinction between push versus pull models for media: mass media (push) implies that communicator 
transmit information to large and scattered audiences (Dominick, 2007; Luhmann & Cross, 2000), while 
social media (pull) is based on interactions between users, and has been found to be more influential than 
mass media in terms of credibility, speed of message transfer, and potential to change behavior (Bessiére, 
Kiesler, Kraut, & Boneva, 2008; Jenkins, 2006; Keen, 2007). Corresponding data can be collected from news 
archives and the participatory web, respectively. 

Finally, face-to-face interaction between individuals is another important channel. Interpersonal 
contact has been identified as the most powerful channel of cognitive, attitudinal and behavioral change 
(Bass, 2004; Rogers, 2003). These data are more difficult to collect than (social) media data; with (partial) 
mappings being possible via surveys and interviews. 


3.1.3 Target 


In marketing, the size of the reachable target audience matters; it determines for instance the cost-per- 


person of an advertisement. However for documentaries, this rationale does not apply, mainly because 
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producers have no tangible metric for assessing effectiveness other than the number of pairs of eyes that 
have watched a film. Thus, the size of the audience can translate into impact, but needs to be complemented 
with additional factors (Barrett & Leddy, 2008; Clark & Abrash, 2011; Figueroa, 2002; John & James, 
2011). 

Another issue related to the target dimension is audience diversity: the more heterogeneous the 
audience, the broader the reach. Studies in risk communication, marketing, social influence and diffusion 
have shown that audiences who are homogeneous in terms of age, sex, income, education or physical 
proximity can limit the ripple effect of communication (Page, 2007; Prell, 2012; Rogers, 2003). 

A classical finding from media effect studies is that ideas flow from media to opinion leaders to the 
rest of the world (E. Katz & Lazarsfeld, 2006; Lundgren & McMakin, 2011). In the CoMTI framework, 
formal opinion leaders, e.g. media editors and professional critics, are distinguished from informal opinion 
leaders, such as popular bloggers and grass-root organizations. The latter type of influencers can be 
identified from social media data via social network analysis (Hansen, Shneiderman, & Smith, 2010; Watts, 
2007). 

One common feature of previous efforts to measure documentary impact is the focus on advocacy 
(Barrett & Leddy, 2008; Clark & Abrash, 2011; John & James, 2011). Established communities of practice 
can be a powerful change agents because members of tight knit groups are subject to group norms (Drazin 
& Schoonhoven, 1996; Rogers, 2003). The importance of communities as change agents justifies their 
inclusion as a separate indicator in CoMTI. 

Data for measuring the indices for the Target dimension mainly come from statistical reports by 
documentary producers, web analytics, surveys and archival records. For identifying informal opinion 
leaders, social network analysis is used. 


3.1.4 Impact 


In the ComTI framework, impact is measured as a weighted function over four stimulus dimensions that 
are associated with cognitive, attitudinal and behavioral changes over time on the individual, communal, 
societal, and global level. Sometimes, a change might be clearly associated with a stimulus, e.g. the creation 
of a new piece of legislature or the adoption of a policy (Barrett & Leddy, 2008). 

Studies in diffusion, risk communication and social contagion generally list four levels of the range 
of impact: individual, communal, societal and global (Kasperson et al., 1988; Lundgren & McMakin, 2011; 
Marsden, 1998; Rogers, 2003). In prior conceptualization of range, impact is assumed to start on the 
individual level and branch out to the next larger level; implying a linear diffusion mechanism from small 
to large. We do not make this assumption, but acknowledge the fact that impact might diffuse between any 
of these layers, maybe in an iterative or reverse fashion. 

Research on human perception and behavior has identified the following sequential process through 
which individuals experience change: knowledge, persuasion and decision (Rogers, 2003; Slovic, Finucane, 
Peters, & MacGregor, 2004). Knowledge is generated when an individual is exposed to new stimuli or 
information and develops an understanding of them. Persuasion means that an individual forms a positive 
or negative opinion towards stimuli or information. Decision follows if an individual becomes engaged in 
activities that lead to accepting or rejecting the given inputs. There is no common agreement on how to 
collect data corresponding to each these stages. KAP surveys have been used for several decades to provide 
information on the knowledge, attitudes and practices of health behavior and innovation adoption (Launiala, 
2009). 

The CoMTI framework conceptualizes the phase of potential documentary impact as consisting of 
cognitive, attitudinal and behavioral factors and suggests corresponding indices. We choose the term 
cognitive because the mental activities related to knowledge acquisition are mainly of cognitive nature. 
Persuasion denotes the intent of communicators to induce attitudinal change in a direction desired by the 
senders. Attitudinal is neutral in that it does not imply any directionality of change. Behavior can be 
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distinguished from cognition and attitude in that it represents tangible changes expressed in words or 
activities. We do not assume a strictly sequential order of these stages and allow for interaction effects. 

In explaining changes in cognition, attitude and behavior, the network concept is vital. Numerous 
studies have shown that perceptions, feelings and behavior initiated by one member of a network can 
influence other network participants (Christakis & Fowler, 2007; De Nooy, Mrvar, & Batagelj, 2011; 
Marsden & Friedkin, 1993; Scherer & Cho, 2003). As discussed for the Medium dimension, social media and 
other forms of interpersonal interaction can be more influential for cognitive and behavioral changes than 
mass media exposure. Furthermore, empirical reports on measuring the impact of documentaries have listed 
the network of viewers or alliances of advocacy organizations as a sign of increased capacity (Barrett & 
Leddy, 2008; Clark & Abrash, 2011; John & James, 2011). For example, the degree of connectedness of the 
audience can be used to gauge the degree of cohesion of members for collective action. The sheer act of 
forming connections to others can be part of a behavioral change. 

The temporal aspect of impact is an understudied issue. Many impact studies have relied on surveys 
and experiments from a single point in time, or use a survey with a before/ after (watching and 
documentary) design (Bryant & Oliver, 2008; Sparks, 2012). The CoMTI framework incorporates the 
temporal aspect of impact by measuring indices at multiple points in time. In summary, the CoMTI 
framework considers spatial, temporal and phase-related aspects of change. 

Data for measuring the Impact indices can be obtained through intensive mining of unstructured 
and semi-structured natural language text data, e.g. from the social web. Text mining and network analysis 
technique will be used to extract entities (including people, organization and information) and detecting 
patterned relationship between them. 

In summary, the CoMTI framework bridges the gap between theory and practice by offering a 
mapping from clearly defined, practically relevant and theoretically grounded indicators of impact to (a) 
crucial dependent variables, i.e. relevant dimensions of impact and (b) cutting-edge method for capturing, 
representing and analyzing these signals based on real-world data. 


3.2 From Theory to Practical Solutions: Analysis Techniques, Technology and Methodology 


Based on the presented review of prior work and the proposed theoretical framework we conclude that 
enabling a reliable, efficient, broad and deep understanding of documentary impact requires the capturing 
and analysis of the web of stakeholders and content associated with (the theme of) a movie. This implies 
the combination of two types of techniques: 


- Social network analysis, which helps to map and assess the structure, functioning and dynamics of 
the web of stakeholders (Wasserman & Faust, 1994). 

- Natural Language Processing (NLP), which help to identify (the valence of) salient concepts and 
themes originating from or shared by stakeholders (McCallum, 2005; Mihalcea & Radev, 2011). 


3.2.1 Technology 


Conducting such analyses in a scalable and robust fashion requires automated solutions. To avoid 
reinventing the wheel, two independent experts from our team evaluated existing tools along the dimensions 
of impact defined in the CoMTI framework and additional relevant features such as pricing and license 
(Table 3). The list of tools, though by no means exhaustive, contains products currently used for 
documentary assessment and alternative solutions. The results (Table 3) show that each tool satisfies only 
a subset of the measurements laid out in CoMTI. Moreover, while some tools offer language analysis 
capabilities and other support network analysis, no single tool combines both methods. However, to measure 
impact the way we defined it, the integrated analysis of text mining and network analysis is indispensable. 
This justifies the need for a new tool that supports both techniques. Based on the outlined assessment of 
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capabilities needed we have been building a new, publicly available tool named ConText 
(http://context.lis.illinois.edu/) that covers the following routines: 


- Data import: social media data collection from Twitter and Facebook. 
- Preparing unstructured and structured natural language text data for analysis: 
o Support the generation of curated corpora of news wire data by splitting up and disambiguating 
downloaded batches from LexisNexis. 
o Organizing the respective meta-data in a database. 
- Analysis of unstructured, natural language text data: 
o Summarization techniques: 
= Corpus statistics, e.g. (weighted) term frequencies 
= Topic modeling 
= Sentiment Analysis 
= Visualization of topic modeling and sentiment analysis 
o Pre-processing techniques: 
= Creation and application of stop word lists 
= Stemming 
= Parts of speech tagging 
o Entity and Relation extraction techniques: 
= Entity Detection (for the entity classes people, places, organizations so far) 
=  Codebook/ dictionary construction and application 
= Relation extraction based on co-occurrence, syntax or meta-data 
= Construction of one mode networks (association networks) and multi-mode networks 


The resulting software integrates a variety of open source libraries, e.g. the Stanford parsers, as well as 
routines that we built from scratch. The software is written in Java plus D3 for visualization. The relation 
extraction part is particularly crucial for integrating text analysis and network analysis. ConText has a 
graphical user interface to ease adoption by non-technical people. We also provide a handbook and training 
material. 

We have designed and built ConText as a general applicability tool for conducting text and network 
analysis on data from other domains, even though the evaluation criteria from the CoMTI framework might 
not apply in such cases. 


3.2.2 | Methodology 


We have developed the following methodology for assessing documentary impact: 


1. Baseline model: Understand the problem space: (Where) is impact possible? 

- Mapping the public discourse and key players related to the main theme(s) of film prior to 
release. Main themes can be identified in a data driven way, e.g. by conducting topic modeling 
on the film transcript, or from film makers or funders (based on our experience throughout this 
project, the outcomes from both strategies do not necessarily align). 

- Data and analysis: collect, analyze and combine text data and network data based on news 
coverage, social media, and focus groups; using the analysis techniques that we implemented in 
ConText for this purpose. 


- Outcomes: 
i Analytical: Baseline model 
ii. Practical: Understand opportunity space for connecting campaign work to relevant 


stakeholders and themes, which helps to strategically allocate scarce resource and mobilize 
social capital. 
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2. Ground truth model: Understand the message of the documentary: Aiming to achieve impact with 
respect to what? 
- Applying the same text analysis techniques as used or building the baseline model, but this 
time to the film transcript. 
- Outcome: 
i. Analytical: ground truth model, i.e. the message that the film can communicate. 
3. Model of reality: Understand the film’s impact: Has the needle moved? 
- Mapping the public discourse and key players related to the main theme(s) of film during and 
after release. 
- Outcomes: 

i. Analytical: The difference between the baseline model and the model of reality, i.e. the 
offset, indicates change with respect to the main issues addressed in the film. The 
intersection of this offset with the ground truth model indicates change due to the content 
of the film. Occurrences or mentions of the coverage of the film from social media and 
media in the same offset indicate change due to the public discourse related to the film. 


3.3 Illustrative Example 


We provide a brief illustrative example of the proposed methodology and technology. We recently presented 
our impact assessment of “The House I Live In”, a documentary by Eugene Jarecki first screened at 
Sundance in 2012, at the 2013 Sundance Creative Producing Summit, where we got plenty of valuable 
feedback on our work that we are currently incorporating into our framework and implementation 
(addressed in the limitations section). 

For this assessment, the funder of the film informed us that the main issue that the movie aims to 
have an impact on is "mandatory minimum sentence” (MMS). We collected the international press coverage 
on this topic from LexisNexis (downloading N=167 articles), and used the LexisNexis routines in ConText 
to parse, deduplicate and preprocess these data; transforming raw download data into to a curated corpus 
and metadata database. 

Figure 2 shows a semantic network generated from the meta-data of the media coverage of MMS. 
This network was generated in ConText by linking any two index terms per article that occur within and 
across user-selected entity classes — in this case “subject” — and that meet or exceed the user-specified 
relevance score that LexisNexis provides. 

Figure 3 provides a summarizing visualization of the themes emerging from the bodies of the news 
articles. This was generated by applying topic modeling to the data and visualizing the main words for the 
main topics as a word cloud. These outcomes suggest that the media frame MMS as (a) a social issue 
centered on people and (b) a legal issued centered on drug abuse and sentencing. 
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Figure 2: Media discourse on mandatory minimum sentencing prior to movie release (semantic networks of 
meta-data) 
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Figure 3: Media discourse on mandatory minimum sentencing prior to movie release (visualization of topic 
modeling of text bodies of media coverage) 


Comparing the themes entailed in the media coverage (Figure 3) to the same technique applied to the 
transcript of the documentary (Figure 4) shows a large common denominator: both text (sets) portray MMS 
as a social issue. However, while media is more focused on prisons and violence, the film itself is more about 


politics related to drugs. 
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Figure 4: Message that the Documentary can convey (visualization of topic modeling of film transcript) 


We assessed the media discourse on the actual movie after its release — again based on articles from 
LexisNexis (N = 167, chose to select same number of articles for comparability) that we converted into 
semantic networks based on the meta-data (Figure 5) and text bodies. Our results indicate that the press 
coverage is mainly about announcements of screenings and centered around the director, but hardly 
addresses MMS - the main issue that the movie aims to have an impact on. While we as academics might 
consider this as a limitations, we were informed at the Sundance summit that producers may aim to position 
a movie as a piece of art first and a communication vehicle for some issue second. This calls for a more 
long-term cycle of evaluation, which is supported by our methodology and technology. 
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Figure 5: Media discourse on “House I Live In” after movie release (semantic networks of meta-data) 


To capture the public reaction to the movie, we also conducted social media analysis using ConText. In 
addition to that, we also used NodeXL because ConText was not yet ready for this part (Hansen et al., 
2010). In the following images, accounts are displayed as nodes if they have more than 200 followers, and 
node size and hue increase with the number of followers. Mapping followers and followees of 
@DrugWarMovie — the handle for “The House I live in” - shows that even though the film was successful 
in attracting a substantial number of followers (N = 2,804), many of them are not that important or 
influential themselves on Twitter (small number of accounts displayed, small node size) (Figure 6). In fact, 
the visually represented accounts are less and smaller in size than the accounts which the film account is 
following, even though the film is following less accounts (1,735) than it has followers. This indicates an 
asymmetry between following key players (successful) and attracting key players (less successful). 
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Figure 6: Twitter-sphere for @DrugWarMovie 


Zooming closer into the intersection of followers and followees (Figure 7) shows that most of these accounts 
are organizations involved with legalizing certain drugs. Only a few types of stakeholders that we consider 
as relevant in this content domain are involved in the public discourse on Twitter — more precisely one 
retired politician, two government workers, 12 small media companies and 33 NGOs. 
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Figure 7: Intersection of followers and followees (red = relevant types of account, purple = any other 
account) 


We do not assume that all social media platforms lead to the same (impression of) impact of a documentary. 
Thus, we are looking at another social networking service — Facebook: the semantic networks built from co- 
occurring and highly salient terms (defined as TFIDF) that appear in the posts of the film’s fanpage suggests 
that the person posting those notes mainly addresses “watching the movie”, “release of the movie” and “war 
on drugs” (Figure 8). This represents classic campaign work. However, the user base (comments and replies 
to posts) not only picks up on these topics, but brings new ones to the table, mainly related to the prison 
system and people of color. This finding suggests that it takes an engaged campaign worker to get a 
discussion started (missing on Twitter for this particular movie). Once this has been achieved, one possible 
form of impact is the public engaging with this topic and taking it into new directions. Note that looking 
at only one social media platform would not have allowed for gaining this differentiated view. 
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Figure 8: Co-occurrence of salient terms from posts on Facebook Fanpage for “House I Live in” 
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Figure 9: Co-occurrence of salient terms from comments on Facebook Fanpage for “House I Live in” 


4 Conclusions, Discussion and Next Steps 


Films are produced, screened and perceived as part of larger and continuously changing ecosystems that 
involves multiple stakeholders and themes. We have presented a novel, theoretically grounded, and 
practically employed and evaluated solution for mapping and assessing the impact of (social justice) 
documentaries by analyzing the web of stakeholders and information related to (the main topic of) a film 
in a systematic, empirical and scalable fashion. This solution overcomes the main shortcomings of prior 
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approaches used or proposed for this purpose. The tool we built to facilitate this process is also applicable 
for conducting text mining and network analysis on data from other domains. 

Several limitations apply to our current conceptualization and implementation: First, our ground 
truth model about a film considers only one dimension of a documentary, i.e. content as represented in the 
film script, while other key elements like visuals and sounds are neglected. While we do not incorporate 
these elements into the ground truth, reaction to it are being tracked. Second, we focus on public awareness 
as reflected in social media data, news coverage and interviews with focus groups. However, an additional 
or alternative goal with impact might be political and/ or corporate change. In the near future, we plan to 
expand our framework and data sources to cover these dimensions as well. Currently, we are enhancing our 
entity extractor to cover additional entity classes and instances that are referred to by a name or not. 
Finally, as we are conducting a range of case studies, we will synthesize our findings into empirical insights 
and try to identify patterns from these results. 
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Abstract 

Even though Twitter is a fairly new web 2.0 tool, U.S. museum leaders are extoling the platform as a 
necessary component to any online programming or presence in museums today. Because of the myriad 
ways museums can use the platform to perform educational, marketing, or engagement-focused 
programming research on U.S. museums and Twitter is generally very broad, and because of the newness 
of the platform research in the museum informatics literature is exploratory. The present study seeks to 
understand more about the relationship building that museums are engaging in using Twitter. Thus, this 
study explores the ways in which museums engage with online users in their Twitter feeds through coding 
the content and frequency of a sample of U.S. museums on Twitter. Through this evaluation the present 
study seeks to understand how engagement is being practiced from the kinds of dialogue museums are 
conducting with online users. 
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1 Introduction 


Even though Twitter is a fairly new web 2.0 tool, U.S. museum leaders are extoling the platform as a 
necessary component to any online programming or presence in museums today (Allen-Greil and 
MacArthur, 2010; Bearman and Geber, 2008; Fletcher and Lee, 2012; Hill, 2010; Lopez, et al, 2010; 
Osterman, et al, 2012; Simon, 2008). In fact, evidence of museums adopting Twitter is reflected in recent 
literature on the range of uses and strategies, from encouraging museums to adopt to create an online 
“community” or to marketing and outreach (Allen-Greil and MacArthur, 2010; Anderson, 2009; Hill, 2010; 
Lopez, et al, 2010; Waterton, 2010). This indicates that museums are employing Twitter for various reasons 
(Fletcher and Lee, 2012; Osterman, et al, 2012). It is easy to see why museums are adopting the social 
media platform with increasing numbers of Americans using Twitter to connect with personal friends or 
businesses, and the service is free to use for both museums and museums visitors (Rainie, Smith, and 
Duggan, 2013). 

Because of the myriad ways museums can use the platform to perform educational, marketing, or 
engagement-focused programming research on U.S. museums and Twitter is generally very broad, and 
because of the newness of the platform research in the museum informatics literature is exploratory (Fletcher 
and Lee, 2012; Hill, 2010; Osterman, et al, 2012). The present study seeks to understand more about the 
relationship building that museums are engaging in using Twitter. To date, no research has focused 
specifically on the participatory model of Twitter, wherein users of the platform can engage in a direct, one- 
to-one dynamic. In fact, Twitter advocates in the museum community recognize that, “this opportunity for 
dialogue with your no-longer-silent audience represents a fundamental shift in the way arts organisations 
will have to work in the future, and a large portion of what [museums] tweet will be responses to questions, 
comments and so on” potentially increasing and deepening the relationship museums have with their visitors 
(Hill, 2010). 
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However, how often museums are creating the opportunity for two-way engagement is still not well 
understood in the research or in museum practitioner literature. Thus, this study explores the ways in which 
museums engage with online users in their Twitter feeds through coding the content and frequency of a 
sample of U.S. museums on Twitter. Through this evaluation the present study seeks to understand how 
engagement is being practiced from the kinds of dialogue museums are conducting with online users. 
Previous studies of the use of Twitter and literature on the adoption of social media by museums, a 
discussion on the term engagement, and how it is different from participation, the methods used to collect 
and analyze Twitter feeds, and a discussion of what further we can interpret from Twitter content will be 
discussed in the following sections. 


2 Literature 


2.1 Twitter in Museums 


Lopez, et al, present the first findings on the presence of Web 2.0 tools on museum websites from various 
countries finding that notably two-way types of engagement were less frequent than individualized or passive 
visitation experiences (2010). In 2008, when the study collected data, content sharing sites such as Flickr, 
Facebook, MySpace, Reddit, Delicious, LinkedIn, YouTube and Twitter were active, however, the study 
noted the most common tools used by museums were blogs and RSS feeds (Lopez, et al, 2010). 2.0 tools, of 
all the online activity from the institutions in their sample, represented under ten percent: sharing sites 
(5%), commenting tools (2%), uploading to the museum website (2%), open forums (1%) or moderated 
forums (2%), mashup tools (0.08%), and collective construction (0.04%) (Lopez, et al, 2010). 

With increased attention paid to social media engagement since 2008, a variety of businesses are 
channeling a fair amount of attention to the Twitter space, including museums. This is customary for 
museums, which have generally adopted and incorporated new technologies into their business management 
including technologies like mobile/handheld devices for tours, personal computers for content management 
systems and backend operations, and web-based technologies for in-gallery interactive screens (Angus, 
2012). Practitioners, like Nina Simon, simultaneously were early adopters of incorporating participatory 
design principles into educational programming and exhibit design to create new spaces for a museum 
“community” of visitors to interact with museums (2010). Combined, these efforts were a natural 
progression toward including Twitter into museum practice allowing for participatory, or co-created, 
museum experiences (Ciolfi, Bannon, and Fernström, 2008; Govier, 2010; Russo, 2011; Simon, 2010). Other 
early adopters and innovative museums christened “new media” departments dedicated to valuing the new 
online “community”, and investing in participatory events or programming (Allen-Greil and MacArthur, 
2010). The rational for this new business model is that museum visitors are moving into a more information- 
available world, and digital means of communicating, so museums likewise should reflect the current 
prevailing behaviors of visitors and cater to them (Lovejoy, Waters, and Saxton, 2012; Peacock and 
Brownbill, 2007). 

To date, understanding Twitter in museum practice has described use and expectations of use. In 
talking to museum practitioners, Fletcher and Lee found that staff is using Twitter most often for event 
listings or reminder notices (60%), followed by posting online promotions or announcements (42%) (2012). 
Only 11% of their survey respondents indicated that they use social media for dialogic/conversational 
engagement, however, when asked about other uses of Twitter, museum practitioners indicated some two- 
way types of behavior, including games or quizzes and engagement with other institutions (Fletcher and 
Lee, 2012). In interviews with museum information professionals from a variety of departments (like 
communications or curators), museum staff admitted that few museums are using social media for public 
engagement at all and ranked engagement as sixth out of ten options (Fletcher and Lee, 2012). Similar 
findings show that in time series data collected over six months at two Smithsonian museums Twitter 
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content had a limited range mostly clustering around marketing types of communication: exhibition, 
upcoming activities and announcements; museum staff commentary or criticism; use new/social media; 
thanks; and sharing links and resources. Although direct inquiry, direct reply, historical information, public 
comments, solicitation for public participation, and ongoing conversation were present, they were less 
common (Osterman, et al 2012). From this cataloging of Twitter activity, we see that museums focus 
Twitter content towards pushing out information. Additionally, museum online visitors have stated that 
Twitter in museums has no specific nature (MuseumNext, 2011). MuseumNext found that users may be 
interested in following institutions, although the range of activities users expect from a museum is pretty 
broad from promotion for events to dialogue with visitors (2011). Museums seem to envision Twitter as an 
opportunity to evolve their practice and role in society to reflect current technological trends and openness, 
but there is no indication from studies on their use of these tools that Twitter has changed their traditional 
means of communication (Falk and Sheppard, 2006; Fletcher and Lee, 2012; Hill, 2010). 


2.2 Museum authority and voice 


Traditional communication models for museums rely on passive absorption of concepts or context being 
offered by a voice of experts (Coldicutt and Streten, 2005; Walsh, 1997). This has lead museums to align 
their educational approach to formal education models through providing curated experiences (Kefi and 
Pallud, 2011; Wetterlund, 2012). Twitter, and engagement, does not have a locus of authority, and there 
are no bounds on who can include context or interpretation to museum collection and exhibits. Some 
practitioners laud this change arguing that social media allows museums to have a “many-to-many” 
relationship with visitors where any historical or factual context can be sourced by anyone with access to a 
search engine (Coldicutt and Streten, 2005; Wetterlund, 2012). One example of a many-to-many relationship 
has developed in science museums. Users are key allies in continued research for identifying astrophysical 
material or long-term bird watching. The relationship that the Adler Planetarium developed with its online 
community relied on large datasets being crowdsourced by amateur scientists, while the verification of data 
remained in the hands of scientists based at the museum (Reed, Rodriguez, and Rickhoff, 2012; Smith, 
2012). In an interesting way, the authority over interpretation has remained in the voice of the museum, 
while encouraging interested public observers to act as a more efficient data collection mechanism. This 
model, some have argued, is the future of online cultural content, one that incorporates both visitor and 
museum knowledge in concert with each other (Graham, 2012; Wetterlund, 2012). However, not all 
museums have adopted new means of communication with their visitors. Art museums, by contrast, 
expressed concern over relaxed relationships with visitors because they view interpretive and authoritative 
expertise as their role in relationships with visitors (Wetterlund, 2012). Social media, it was argued, 
challenges curatorial authority especially (Trant, 2006; Wetterlund, 2012). 

Additionally, Twitter is a conversational medium that is completely different from the traditional 
communication models to which art museums are accustomed. The public nature of the platform, the tone 
and content of museum Twitter feeds serve as a complex mix of corporate, institutional voices with personal, 
individual connections. Interestingly, also, museums are institutions made up of several staff, which may or 
may not color the dynamic of the relationship that museums can have with their Twitter followers. The 
burden for maintaining the institutional voice may fall to one staff member responsible for tweeting, and 
the decision of which person in what department is very telling for how museums want to continue to be 
perceived. Marketing department staff will approach the issue of voice very differently than education staff, 
website staff, or curators. 


2.3 Participation vs. engagement 


Most museums start using Twitter to enhance a museum’s ability to reach new audiences and to connect 
with visitors in a more meaningful way (Angus, 2012; Cameron, 2003; Fletcher and Lee, 2012; Osterman, 
et al, 2012). Counting website hits and followers on Twitter may be able to satisfy this initial interest in 
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using Twitter, but counts cannot simply be used as a proxy measurement for success with any online 
activity (Allen-Greil, 2012; Chan, 2008). There is a faction of museum practitioners and researchers who 
are looking beyond Twitter use to “visitor engagement” online (Fantoni, Stein, and Bowman, 2012; 
Richardson and Visser, 2012). However, this faction has not come to an agreement about when and where 
programming is deemed “engaging” or how to measure engagement in activities where the museum and 
visitors are both contributing. 

In fact, it is becoming increasingly difficult to parse participation and engagement in online activity, 
especially on Twitter. It’s common in the museum literature for these two terms to be used interchangeably 
to describe some activity or program wherein visitors were active and made a contribution, however small 
or large that may be. Therefore, any participation is counted, and simultaneously counted as engagement. 
Logically, this basic understanding of engagement and participation and the presence of both together does 
not help museums understand how they are having an impact on the quality of the experience they provide 
visitors. In fact, in Nina Simon’s book, The Participatory Museum, she outlines measures for escalating the 
visitor’s ability to participate in the museum-to-museum visitor relationship from very little to taking over 
the museum, but not the impact or engagement of a participatory experience (2010). 

Additionally, the studies on Twitter use in museums have demonstrated engagement is difficult for 
museums to perform, and also difficult to measure. Fletcher and Lee note that even in considerations of 
time (i.e., how long museum staff spend on social media) meaningful engagement is still not a direct result 
of how much time museum staff spends on social media (2012). Similarly, Osterman, et al, found that with 
time series data (over six months) of Twitter content from two museums that types of participation, such 
as solicitations to the public and ongoing conversations, were not primary content (2012). Recently, new 
capacity building literature emphasizes that in order to promote engagement and to understand this 
behavior from users, museums must also define what kinds of participation they are interested in 
encouraging in their visitors (Richardson and Visser, 2012). While this next step is encouraging, it also 
lacks a clear definition of engaging activities, or how to measure success. From these resources, it is clear 
that capturing data on engagement requires different measurements that provide a common interpretation 
of what behavior from museums and what behavior from online visitors is engaging. 

Increased knowledge and testing for measuring engagement can begin laying benchmarks for 
activities, programs, and implementation strategies for social media use that would differentiate 
participation from engagement. To date, studies focused on Twitter use cannot tell us a lot about how 
museums are interacting with their online visitors, simply that museums are pushing information to visitors 
online. Although museum professionals surveyed have not indicated that conversational, two-way 
engagement is a primary focus, the types of dialogue that they are engaging in has not been well explained. 
In the following sections, the present study will attempt to answer that question and describe the frequency 
with which engagement is happening testing an engagement coding rubric. 


3 Methods 


In order to know how a museum is employing a level of engagement, a natural place to investigate would 
be the social media policy or guidelines that dictate how staff or the public engage in the platform and what 
content is deemed inappropriate. Unfortunately, not many museums have such policies available. Therefore 
the present study seeks to answer two broad questions about social media use: 


1) What are the types of tweets museums are posting in Twitter? 
2) How, and through what activities, are museums engaging their visitors via Twitter? 


Using a multi-method approach, the present study employed quantitative counting and categorization of 
content tweeted by a purposeful sample of 50 museums (R1). Twitter was selected because it is a source of 
textual content and because many museums are using the platform. Outwit Hub Pro software was employed 
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to scrape Twitter postings from each museum in the sample, along with time stamp, user ID, and tweet ID. 
Twenty postings were collected from a museum’s publicly available feed on October 13, 2012 showing 
activity from the date of collection backwards over time. After data collection, 48 cases were kept for 
analysis. The sample was constructed to collect data from across museum type and size, though 18 museums 
in the sample represent art and history institutions and 18 represent science institutions. Twelve museums 
in the sample are non-disciplinary types including children’s museums or halls of fame. 


Dimension Level of Engagement Score Weight 


-Number of follower or friends 


Count 
et -Number following or friends 
rere -Clearly states the official status of the social media page 0/1 1 
Reliability . i F 
-Uses recognized museum logo as profile picture 0/1 1 
-Not only recycled content from other social media network 0/1 1 
-Uses platform specific tools (e.g. #hashtags, Qreplies, retweets, 0/1 2 
Content favorites, tags, playlists, events, etc.) 
-Multiple social media channels with links across platforms 0/1 3 
-Has “live content/interactive” sessions (e.g., Twitter townhall) 0/1 3 
; w -Access to social media clearly indicated on main homepage 0/1 1 
Findability ; ; ; j 
-Social media searchable through internal search engine 0/1 1 
Once a month<frequency<once a week 0/1 1 
Once a week<frequency<every day 0/1 2 
Frequency 
New content posted every day 0/1 3 
Several times a day 0/1 3 
Once a month<Active user engagement<once a week 0/1 1 
Once a week<Active user engagement<every day 0/1 2 
Engagement . 
Actively engages (and responds) to users every day 0/1 3 
Not at all 0/1 0 


Table 1: Coding dimensions 


A categorization protocol was adapted from a Stratford Institute for Digital Media report to capture types 
of activity that a museum engages in through the Twitter platform (Stratford, 2012). Table 1 above shows 
the evaluation dimensions and scoring rubric. 

The Stratford Institute rubric provided two legitimate bases for this study: 


1) The rubric could be validated for its applicability on engagement activity in Twitter, and 
2) Allow the present study to focus specifically on the coding of engagement activity. 


Dimensions were coded using the rubric above to count the number of followers, reliability of activity, 
content, findability of Twitter account, and frequency of posts. Each tweet coded in the study was evaluated 
against the dimension criteria and scored using a dichotomous system; zero indicated that the description 
of the activity was not satisfied, and one indicated the activity was satisfied. After coding the data for the 
frequency dimension, the coding rubric was augmented to add a temporal dimension for “several times a 
day” to credit museums in the sample for high activity. Levels within each dimension were weighted and 
scored depending on the level of engagement present in each activity. The author coded all dimensions, 
except engagement independently. Two coders coded the engagement dimension independently, and results 
were compared for intercoder reliability. 

Because of the difficulty in measurement, the base exchange of a museum to a visitor will be taken 


as participation in the present study, whereas continued one-to-one conversation in Twitter will be counted 
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as engagement. This allows for graded levels of how engaged a museum can be with their online community 
in the social media space as it represents a relationship of interested parties in the museum, and a museum 
interested in responding in-kind to the visitor (as if to have an almost personal kind of relationship). 

In lieu of social media policies, content from Twitter postings was analyzed as a proxy for a de 
facto social media policy or communications strategy employed by museums in the study. All Twitter 
content from museums in the sample was further analyzed using the engagement dimension listed above in 
Table 1. Engagement represents the intersection of content with frequency, and describes how often the 
activity within the Twitter platform between a museum and its users is dynamic enough to show an engaging 
(e.g., participatory) relationship (R2). In the engagement dimension, some content may be more engaging, 
original, or museum-specific, therefore different weight is assigned to the level of the interaction observed 
in the tweets. Once the engagement dimension was coded, the rubric was augmented to account for museums 
in the sample that did not engage in any activity that showed a level of interaction with users. Additional 
analysis compared results between art and history museums with the science museums to see if art and 
history museums are more restrictive in their engagement of community participation. This assumption was 
based on the abovementioned opposition art museums had to social media and participatory experiences. 


3.1 Limitations 

Four primary limitations impact the findings in this study: the small sample size, the cross-sectional data 
collection method, the coding rubric, and lack of additional data. The small sample of museums in the study 
allows for in-depth analysis of the Twitter activity of each case of the sample. A larger, representative 
sample would have yielded generalizeable findings, however, the small number also tests the rubric and its 
applicability in this study after adapting it from the original application and narrowing this study to the 
Twitter platform. Ideally, Twitter activity would have been collected in a time series to compare findings 
of engagement from a single museum. It is possible that the date and time when data collection occurred 
did not yield engagement findings, however, a museum in the sample could generate engaging activity. 
Without being able to compare more than 20 tweets from each museum in the sample, it would be difficult 
to characterize the Twitter feed of a museum generally. The rubric, however, does provide a clear method 
of how to capture engagement as a combination of content and time to mitigate the lack of evidence from 
cross-sectional data. The issue with the rubric is that it also captures a wide range of activity in the 
engagement dimension, which provides a complex picture of engagement across the sample. Lastly, this 
study does not capture the lifecycle of a tweet to contextualize how museum tweets generate activity 
amongst its followers. People that are connected to the museum can engage with each other using the 
museum tweet as a catalyst for conversation and dialogue without actually engaging with the museum 
directly. Retweets, in this instance, would provide this context for engagement that may change the 
perspective on how the information that is tweeted by museums is possibly more successful in engaging 


users. 


4 Findings and Discussion 


While not widely generalizable, the findings from this study highlight potential areas for more detailed 
investigation with a broader sample from the museum sector. Initial tabulations showed that more than 
half of the sample had a higher number of followers than the institution followed, noted the official status 
of the museum in the Twitter profile, and used the official logo for the Twitter handle. Almost all sampled 
museums made the Twitter feed (and other social media) findable from the museum homepage, and all 
institutional Twitter feeds were findable through search engines. Tables 2, 3 and 4 below show a detailed 
summary of the findings from the study from the more complex dimensions. 

In the content dimension, the highest portion of museums (97.9%) was cross-referencing social 
media platforms in their Twitter postings. Generally, the cross-referencing linked out to Instagram photos, 
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Facebook posts, or the museum website to reference ongoing or upcoming activities. Using the reply 
function, hashtags, shortened URLs, were the next most common content on the museum Twitter feeds. 
This is not surprising given that these two content activities often combine, for example, to direct Twitter 
traffic using a website announcement for a family program at the museum by including the shortened link 
in a tweet. Recycled content, such as tweets from other users, was still common amongst the sample, 
however, the presence of recycled content could also suggest that museums are building online communities 
where they may want to repost the tweets of other Twitter users because it is of interest to their followers. 
Only one museum in the sample was livetweeting an event through successive posts. 

Overall, it appears that a large number of museums in the sample focused on original content in 
their Twitter feed. This suggests that Twitter use is becoming a core business function with dedicated staff 
time for uploading and attending to postings. This comports with Fletcher and Lee’s findings that indicate 
almost all US museums are using some social media platforms, and at least one staff person is dedicated to 
maintaining social media for their museum (2012). In fact, each of the museums in the sample was also 
found to be active in Facebook. 


Content Dimension Aspects Overall Art/History Science Other 
Recycled Content 29.2% 38.9% 11.1% 41.7% 
Platform tools 89.6% 88.9% 88.9% 91.7% 
Multiple social media channels 97.9% 94.4% 100.0% 100.0% 
Live content 2.1% 5.6% 0.0% 0.0% 


Table 2: Content dimension by disciplinary type 


The majority of the sample was tweeting several times over the course of a day. The frequency dimension 
did not originally count high activity, though after the coding was completed, the number of museums 
tweeting very often was deemed an important marker of the investment museums are making in their social 
media strategy. There are remarkable differences in frequency across disciplinary type of museum. Art and 
history institutions were very active on Twitter — none falling behind one tweet per week. Other types of 
museums in the sample, such as children’s museums, halls of fame, and historic villages had less overall 
activity with 16.7 percent posting as slow as once per month. Interestingly, science museums fell in between 
art institutions and other institutions with a high rate of activity (72.2%) posting several times per day, 
though the posters tweeting every day is a smaller portion of science institutions. In comparison, art and 
history museums showed activity the highest rates for everyday posting, and then a slightly smaller 
percentage of those museums are tweeting over the course of a day. 


Frequency Dimension Aspects Overall Art/History Science Other 
Once/month 6.3% 0.0% 5.6% 16.7% 
Once/week 18.8% 11.1% 27.8% 8.3% 

Every day 77.1% 88.9% 66.7% 75.0% 
Several times a day 75.0% 83.3% 72.2% 75.0% 


Table 3: Frequency dimension by disciplinary type 


It is important to highlight that when coding the engagement dimension two types of activities were 
observed. Two examples of the types of activity captured when coding for engagement included 1) 
Participation such as museum replies to users who had already posted to the museum account stating 
something like “Just had a great time at X museum”, and the museum simply replies by thanking that 
user, and 2) Engagement for dialogic activity between a museum and a user wherein multiple exchanges 
are evidence of a dialogue. Considering this, there will be some content in the engagement dimension that 
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is somewhat meaningless considering the tweeting conversation might be limited to two tweets, especially 
if the user prompts the exchange. However, for some museums this may be the extent to which they are 
able or interested in interacting with visitors on Twitter. 

For art and science museums no engaging activity is the highest percentage of all activity. While 
this may appear to show that museums are not active on Twitter, it simply means that no engaging dialogue 
was present in the data. Many of the institutions that are tweeting about event programming are the sole 
voice heard in the Twitter feed, however, there is evidence that some users are tweeting to museums who 
are not responding. There is also some difference between disciplinary types of museums in terms of time 
where science museums are engaging with visitors at a less frequent rate of once per week. In contrast to 
the finding above that other museums are tweeting less frequently, it appears that they are engaging to a 
greater degree at a rate of once per month. This describes an interesting thread in the data on other types 
of museums where infrequent tweeting, and overall less engagement online from these institutions could be 
related to their overall approach to Twitter where less frequent posts result in deeper engagement overall. 

For children’s museums in the other category, in particular, Twitter may not be the primary 
platform for interactions, and engagement. Although 93 percent of teens report to have at least one 
Facebook account only 12 percent of teens are on Twitter (Lenhart, et al, 2011). It may also be that 
engagement with children online is focused in more visual platforms like YouTube or Vine. 


Engagement Dimension Aspects Overall Art/History Science Other 
Once/month 12.5% 5.6% 5.6% 33.3% 
Once/week 22.9% 11.1% 38.9% 16.7% 
Every day 27.1% 38.9% 16.7% 25.0% 
Not at all 37.5% 44.4% 38.9% 25.0% 


Table 4: Engagement dimension by disciplinary type 


4.1 Users of social media 


Recent findings from the Pew Internet and American Life Project situate some of the activity and results 
from the sample. As of August 2012, 67 percent of adult users 18 and older use social networking sites, 
however, only 16 percent of those adults are using Twitter (Duggan and Brenner, 2013). The expectation 
may be that Twitter is a newer platform than Facebook, however, there is a qualitative difference between 
the kinds of engagement that are possible through each site. While Facebook may be more popular with 
adults (Pew notes 66 percent of adults using it — more than any other social networking platform), it is also 
a generally a one-to-many platform where institutions produce all of the content that is consumed through 
a Facebook page, and commenters can engage with the content posted through contributing insights, ideas, 
or likes in the comments section below each post (Duggan and Brenner, 2013). Facebook data for each 
museum in the sample was collected over the same time period revealing that no other user was posting to 
the Facebook page of a museum, though they were able to comment. This type of limitation of the platform 
does not directly connect the visitors with the museum, and focuses all activity by the visitors at other 
commenters. Twitter, by contrast, directly connects online visitors with museums through a one-to-one type 
of exchange. The type of engagement that exists in each platform is markedly different because of the ability 


for direct exchange with museums. 


4.2 Does Twitter help museums engage with visitors? 

The results of the survey indicate that Twitter does not help museums to engage with visitors. It appears 
that museums in the sample have taken to the Twitter platform as a means to enhance their marketing 
practices, however, a picture emerges from the study that engaging behavior, and opportunities for online 


visitors to participate, are less frequent and less common than is possible in the Twitter platform. When 
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compared to the findings from MuseumNext’s survey in 2011, and Fletcher and Lee’s study from 2010, 
museums and Twitter followers of museums seem to expect this activity. The echo that resounds from this 
space is hard to understand given that Twitter intentionally places a premium on direct connection between 
users. Whether or not the primacy of marketing on Twitter, and its expectation, is intentional, the fact 
that museums and their Twitter followers are conceptualizing the platform as a space primarily for 
marketing ensures that control over conversations online remains in the hands of the museum. 

This is evidenced in different categories of activity that emerged as more or less engaging from the 
study findings (Figure 1). To a large degree the activity observed by museums in the sample was focused 
on one-to-many traditional communication. Those activities require and elicit fewer (or lesser) participatory 
behavior from a museum’s Twitter followers. Even posting “fact of the day” information to Twitter, an 
arguably less marketing-driven and more educationally driven activity does not prompt responses from 
visitors. By contrast, some museums in the study were inviting participation by taking a concept like “fact 
of the day” and turning it in a game where a picture of a collections item would be accompanied by asking 
Twitter followers to make a guess about the photo’s content. Livetweeting an event also invited online 
visitors to join a conversation about a program happening at the museum. 


Public Relations 
Games (voting, “like if”) 


Events announcements 


Co-curating projects 
Fact of the day 


Live tweeting events 
Retweeting other user’s or 


institution’s content 


Figure 1: Museum activities in Twitter that are engaging 


Fletcher and Lee noted that content was one way museum information professionals conceptualize 
generating more participatory or engaging behavior in social media indicating that the more “social activity” 
that can surround the social media content (e.g., the number of mentions of a museum on sites not related 
to the museum) would lead to greater success with engagement (2012, p. 13). While this approach would 
surely encourage museum online visitors to talk independently about the museum amongst each other, and 
thereby creating a many-to-many conversation, the most important way for a museum to remain in the 
conversation and promote the many-to-many engagement that is possible via Twitter is to define a 
community of online users and set goals for how they expect others to engage with each other (Hill, 2010; 
Richardson and Visser, 2012). 
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5 Conclusion and future research 


In collecting Twitter posts from 48 museums across the US, the present study evaluated how the platform 
is used by museums and to what degree the content on Twitter represents online engaging activity. The 
majority of museums in the sample (97.9%) are posting with cross-platform content signaling that they use 
several social media channels and are connecting their online activity for users who may only reach the 
museum through one platform. Remarkably, recycled content (a low engagement activity) is low amongst 
museums in the sample (29.2%); however, livetweeting or Twitter townhall (a high engagement activity) 
represented the lowest across all content activity (2.1%). More than three-quarters of museums in the 
sample tweeted every day with three-quarters tweeting several times a day. Frequency amongst types of 
museums had the most fluctuation. Other museums had the fewest postings at 16.7 percent tweeting once 
per month. Interestingly, tweeting less frequently did not necessarily yield less engaging activity. For some 
museums, tweeting infrequently may also result from focusing on other social media outlets where their 
audience base is more active. 

Findings from this study demonstrate museum and online museum user behavior at one time. 
Additional data collection is needed to confirm the behaviors and activity present in Twitter to produce a 
more holistic picture of when engagement may occur, even if that activity is episodic or infrequent. While 
other studies have confirmed that engagement overall tends to be low across the US museum sector, these 
characterizations also represent different interpretations of how to measure and define engagement. 

Future research will require understanding users that are participating with museums online beyond 
demographics to focus on the types of activity that online museum visitors respond to and enjoy. A social 
network analysis of a museum’s online community will describe how users are connected to the museum, to 
each other, and lead museums to form a sense of online community with whom they wish to engage. 
Museums could then expect to define how they want to engage with that community by first defining their 
role in it, and the behaviors they ultimately want to encourage within an online community of supporters. 
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Abstract 

As a global phenomenon, Irish traditional music has a tremendous following, while practitioners of Irish 
traditional music often disagree on shared aspects of their music culture. This provides numerous 
challenges when organizing traditional Irish traditional music for retrieval purposes. This paper describes 
the music information seeking and retrieval (MIR) challenges of Irish traditional music in terms of the 
physical paradigm and user-centered relevance, using TheSession.org as an example. Limitations of the 
physical paradigm are addressed, both related to the traditional music subject matter, and how current 
MIR systems fall short in their attempts to manage Irish traditional music. Additional discussions of 
user-centered relevance contextualize the problems related to music information seeking and retrieval 
with traditional musics. In the future, representations of music objects connected as linked open data, 
combined with multiple query capabilities, may provide a robust and flexible structure for traditional 


music information seeking and retrieval. 
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1 Introduction 


Music Information Retrieval (MIR) as a discipline has been primarily concerned with developing tools and 
methods centered around properties and characteristics of Western “common practice” musics, meaning 
Western Classical and Western Popular music (Byrd & Crawford, 2002; Downie, 2003; Tzanetakis et. al, 
2007; Serra, 2011). More recent interest in ethnic or world musics has emerged in MIR research (Tzanetakis 
et. al, 2007; Doraisamy et. al, 2008; Gomez & Herrera, 2008; Gómez, Haro, & Herrera, 2009; Gedik & 
Bozkurt, 2010; Lidy et. al, 2010; Ioannidis, Gómez, & Herrera, 2011; Koduri et. al, 2012), which has resulted 
in a re-examination of how existing tools can be used for a diverse array of musics. Although interest has 
increased, research approaches to MIR using world musics has remained largely centered in the physical 
paradigm of Information Science — a grounding in the belief that physical properties and characteristics of 
information that make it tangible and able to be manipulated for the purposes of organization, description, 
and retrieval. 

Most traditional practices or art forms are transmitted orally and aurally from master to apprentice, 
with less of an emphasis on written records or instructions. Likewise, sources of information within traditions 
are the practitioners themselves, as opposed to more scholarly, written sources. Because of the global 
popularity of Irish traditional music, the greater Irish music community is a diverse group of musicians with 
various demographic, geographic, and cultural backgrounds. As these Irish music practitioners exchange 
information within online environments, the collaborative and social nature of music information sharing, 
seeking and browsing creates a large, but messy information corpus. Most of the Web-based Irish traditional 
music communities and tune databases are user-submitted and moderated, such as TheSession.org, Chiff 
and Fipple, Concertina.net, and the newer social network for Irish musicians, TradConnect. TheSession.org 
is particularly popular, and contains a large database of traditional and composed Irish dance music, a 
recordings database, events, local sessions, links, and discussion boards. 

Irish music information is represented in many forms online, namely in textual, visual, and audio 
formats. The methods for finding and extracting relevant information can involve multiple steps: multiple 
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searches using different aspects of music metadata like a tune title, key/mode, and tune type, viewing 
transcriptions, and even listening to a MIDI file to confirm the correct tune has been found. In addition, 
melodic-string matching queries may involve knowledge of an additional textual language, an ASCII-based 
representation of music notation called ABC Notation. This form of notation has become a popular method 
of transcribing the basic melodies of Irish traditional music, along with other types of traditional music. 

This paper describes the music information seeking and retrieval (MIR) challenges of Irish 
traditional music in terms of the physical paradigm and user-centered relevance, using TheSession.org as an 
example. Specifically, the paper will address limitations of the physical paradigm both related to the 
traditional music subject matter and how current MIR systems fall short in their attempts to manage 
traditional music information. Additional discussions of user- -centered relevance will contextualize the 
problems related to music information seeking and retrieval in systems that contain Irish traditional music 
content. 


1.1. Limitations of the Physical Paradigm 


Music information can represented in many different ways apart from an audio signal (Downie, 2003; Liem 
et. al, 2011). While music information retrieval systems typically house a combination of audio files and 
associated textual metadata, music is manifest in many diverse representations. Liem et. al (2011) argue for 


“ „multiple other 


music information retrieval to be approached instead as multimedia retrieval, noting how 
modalities hold useful information that contribute to the way in which the music is conveyed and 
experienced: e.g. visual information from video clips and cover art, textual information from metadata, 
lyrics and background articles, and social community information on listening and rating behavior. This 
existence of complementary representations and information sources in multiple modalities makes music 
multimedia content” (p. 1). The authors also emphasize a user-centered approach based upon the 
experiential and subjective qualities inherent in music. 

With text-based IR forming the basis for the development of IR systems, Music Information 
Retrieval (MIR) is prone to several challenges, similar to those in other multimedia retrieval (Downie, 2003; 
Inskip, C., MacFarlane, A., & Rafferty, P., 2010). A decade ago, Downie (2003) articulated five challenges 
for the field’s development: the multifaceted, multirepresentation, multicultural, multi-domain, and multi- 
experiential challenges. Downie views music as comprising of seven facets: pitch, temporal, harmonic, 
timbral, editorial, textual, and bibliographic. The interaction between these complex facets, and the 
difficulties that ensue, are what Downie terms the “multifaceted challenge” (2003, p. 297). Irish traditional 
musicians—as with many traditional music practitioners—frequently disagree with one another in the areas 
of pitch facet (differences in tune melodies), temporal facet (meter), harmonic facet (chords that imply 
key/mode), timbral facet (idiomatic transcriptions that favor a particular instrument), and bibliographic 
facet (tune titles, composers, recordings lists, published collections containing the tune). 

In addition to the multifaceted challenge, the Downie's multirepresentation challenge speaks directly 
to the physical paradigm in MIR. This challenge encompasses the many forms that music might be 
represented symbolically, aurally, or both. Within tune databases that house Irish music, most of the 
representations are what Downie terms “symbolic:” ABC Notation, GIF sheet music, and MIDI audio—not 
considered “audio” according to Downie’s classification because it is machine generated, and therefore an 
aural symbol of music (2003, p. 302). True audio representations, according to Downie, are live performances 
or digital and analog recordings of performances. 

The multifaceted and multirepresentation challenges have immediate applicability to the indexing 
and orgnanization of Irish traditonal music content — as well as music content from other traditions. 
However, the multicultural, multi-experiential, and multi-domain challenges could be viewed as embedded 
and inseparable from the others, as music is experienced and perceived differently across the globe and 
between various knowledge communities (Byrd & Crawford, 2002; Downie, 2003). While the physical 
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paradigm of Information Science looks for logic and consensus when organizing content for retrieval, the 
messy reality of musical traditions defy such expectations. 

The physical paradigm emphasizes the importance of establishing what defining characteristics an 
information object possesses before it can be organized for retrieval, and this is regardless of format. Music, 
like other information objects that are “not documents in the normal sense of being texts can nevertheless 
be information resources, information-as-thing” according to Buckland (1991, p. 354). According to Raber 
(2003), the physical paradigm views information objects as possessing characteristics that “make|s] them 
organizable and retrievable in logical and predictable ways” (2003, p. 53). Objects cannot be organized 
according to meaning or individual perception, but rather one should try to establish an independent 
meaning for the information object. However, Buckland (1991) puts forth that the nature of information is 
situational and depends upon consensus over what is informative. 

Raber (2003) also discusses a consensus over what is informative-what he terms “shared meaning”— 
something separate from cultural or social context. Shared meaning implies that there is agreement upon 
“objective, intrinsic properties” even if the cultural interpretation of those properties varies between the 
individual (Raber, 2003, p. 57). Consensus over shared meaning is unlikely due to how music is experienced 
by individuals and groups, as “the way music is experienced is strongly guided by affective and subjective 
context- and user-dependent factors” (Liem et. al, 2011; p. 1). 

TheSession.org’s tune database is organized based upon what-in theory-should be objective 
properties of the tune: tune information within the bibliographic facet such as title, key/mode, and type 
(meter). Downie describes this bibliographic information—-what he terms “music metadata”—as the only 
information not derived directly from the content of the score (2003, p. 301). Ideally, bibliographic 
information should be the result of a group consensus that gives a shared meaning to the tune record. In a 
perfect world, each tune would have an agreed-upon key/mode, one unique title, and an agreed upon setting 
of the tune. This is rarely the case with traditional music, because, “although classification seems like an 
objective task, the definition of categories...is subjective in its nature” (Lidy et al., 2010, p. 1043). 

Locating specific tunes within Irish music databases without knowing a title or any recording 
information may employ the use of an additional notation language, an ASCII-character representation of 
a melody called ABC Notation. This notation format is a basic text rendering of information contained in 
the pitch and rhythm facets—a translation of the musical score. Because ABC is based on ASCII characters, 
tunes in ABC can be machine processed and analyzed using algorithms for textual information retrieval 
(Duggan et al., 2008, p. 25). This is ideal for quick machine processing, however Irish traditional music does 
not necessarily lend itself to transcription in ABC or any other form of notation. 

Many types of traditional musics, including Irish traditional music, did not evolve in transcribed 
form. They evolved aurally and have continued to change and be shaped through a natural process of oral 
transmission. This means that there is not one correct or authoritative version of the information object, 
but rather many objects can be seen as representations of the same whole information object. A single 
setting, or version, of a tune cannot be considered the “correct” one; all settings of a tune should be 
considered equal in importance and should have the opportunity to be deemed the most relevant to a query. 

Duggan et al. (2008) notes that “MIR in traditional Irish music has an additional difficulty in that 
traditional musicians rarely play tunes as transcribed in books. In fact...a good traditional musician will 
almost never play a tune twice, identically” (p. 26). This makes queries using strings of ABC notation 
problematic. Tunes can have numerous versions, and even transcriptions from the same player’s performance 
can be heard or notated differently according to a user’s own preferences and knowledge (Duggan et al., 
2009). In traditional musics, this means that sometimes the physical and cognitive paradigms are 
inseparable. 

Having one authoritative version of the tune for retrieval purposes denies the individuality, 
uniqueness, and personalization that is a part of Irish traditional music. Group consensus as to the correct 
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setting of a particular tune is impossible, as practitioners often disagree about large and small elements of 
their shared culture, making the perspective of one practitioner potentially equal to the perspective of 
another. Also, in a traditional music culture, any and all information could be considered informative. 
Buckland’s (1991) theoretical grappling with the physical paradigm actually applies directly to Irish 
traditional music and the information systems built to house it: “If anything is, or might be, informative, 
then everything is, or might well be information” (p. 356). Especially in the case of traditional music, 
everything is considered information and should be accessible to the information seeker. 


1.2. Information Seeking Challenges 


TheSession.org is an example of what Downie calls a Locating MIR System, a system “designed to assist in 
the identification, location, and retrieval of musical works” (p. 309). Users of Locating MIR systems 
generally have the end goal of performing the musical works they retrieve from the system. The tunes 
database of TheSession.org contains individual tunes, and the recordings database houses recordings with 
track listings hyperlinked to the individual tune records. The information seeking and retrieval process 
within TheSession.org, for example, involves performing queries by combination of tune key/mode, tune 
title, tune meter, or by following hyperlinked tune titles listed within album tracks in the recordings 
database. 

The options for searching the Tunes section within TheSession.org allows the user to input or leave 
blank combinations of music metadata such as tune type (jig, reel, hornpipe, etc.), key/mode, and either 
the tune title or an ABC string, for example: ABA A2E ~G. The system returns all results matching the 
query text, meaning any of the terms queried are included in the search results but not all words are present 
in every item the system retrieves. For ABC string searches, the system returns ABC results irrespective of 
octave—called “octave normalization,” present in other MIR systems—making searches not as precise (Duggan 
et al., 2008). 

There are a number of system limitations that appear when searching the Tunes database within 
TheSession.org. As Uitdenbogerd and Zorbel (2004) discovered, query mismatch between user and system 
can be the result of differing keys or modes, sung or hummed queries and pitch inaccuracy, and tune 
arrangements obscured through variation and ornamentation. A search by key/mode and tune type can be 
problematic, as traditional tunes can be played in several keys and meters. In addition, the text-based 
retrieval system cannot compensate for ornamentation in neither the query nor the result, nor can it 
transpose an ABC string to match tunes in other keys/modes from the initial query. 

Some MIR scholars continue to argue that text-based systems that retrieve basic metadata will 
successfully meet the needs of information seekers. Inskip, MacFarlane and Rafferty (2010) articulate this 
assumption: “...Known-item searching for music can be dealt with by searching metadata using existing text 
search techniques.” Again, this assumption has its foundation in the Western Classical approach by MIR 
scholars, and has limited application to musics with different characteristics like Irish traditional music. 
With a corpus of over 7,000 tunes (Duggan et al., 2008), a user cannot expect to find the correct tune 
reliably within TheSession.org or any other repository of Irish traditional music without also knowing one 
of its titles. 

Searching by ABC string-—or by title, or key, or meter—implies a stability that is not present in any 
of the information objects, as these are based in traditional practice. Traditions are carried on by the people 
that practice them, and instability in information objects is expected. Raber (2003) summarized the user- 
centered view of relevance by stating that at its heart, information needs are unstable. This means that 
relevance itself is unstable and cannot be neatly translated into a search query (Raber, 2003). Hjørland 
(2007) considers the instability of information to be rooted in the subjectivity of what can be considered 
informative. Traditional music is inherently subjective, making user-centered relevance even more important 
when designing MIR systems to accommodate such diverse viewpoints. 
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1.3. User-centered relevance. 


User-stated relevance is challenging within TheSession.org because of the geographic and linguistic diversity 
of its users, as well as their range of expertise in Irish traditional music. To formulate the initial query, a 
user has to have some knowledge and must be able to identify which items will be relevant or not relevant 
to it (Baeza-Yates & Ribeiro-Neto, 1999). Novice to expert users are thought of in terms of comfort and 
depth of knowledge in Irish traditional music. A novice user might be limited by inability to think beyond 
his or her instrument or by insufficient knowledge about the music to make advanced search decisions. The 
idea of expert versus novice user relevance judgments is also problematic when trying to consider what 
defines someone as an expert. 

In the actual information seeking and retrieval process, the limitations of the physical paradigm 
may result in a number of problems for TheSession.org users when determining relevance of music 
information. Some of these problems derive from the differences between the written physical object and 
the actual musical performance of the written object (Uitdenbogerd & Zobel, 2004). Others are based upon 
the various levels of knowledge of the users creating the objects and the users seeking those objects. These 
limitations, described in greater detail below, are divided into the following sections: title, meter, 
ornamentation, key or mode, and relevance feedback. 

The musical background and proficiency of the user becomes central to the success or lack of success 
with MIR queries (Bainbridge, Dewsnip, & Witten, 2005; Lesaffre et. al, 2008). Some Irish traditional music 
practitioners may be exclusively aural learners, while some may also read sheet music. Those users that 
cannot read sheet music or ABC notation cannot recognize the correct tune among the query results without 
hearing a MIDI rendering, however site-wide updates to TheSession.org in 2012 deliberately removed the 
auto-generated MIDI files. Users responded within discussion threads with mixed reactions, for example: “I 
miss the MIDIs” and “Agree re midis, I find them useful sometimes just to get a quick sense of what’s 
notated to see if it matches the tune I’m thinking of. But it’s easy to find another utility to paste the ABC 
in and do that - just a bit fiddly” (TheSession.org, 2012). 

Those users who read sheet music but are new to Irish traditional music might not have the 
necessary musical skills, or knowledge of ABC notation, to transpose ABC search strings to find a tune 
submitted in a different mode or key. As Bainbridge, Dewsnip, and Witten (2005) observe of user-submitted 
melodic queries, “absolute pitch is not in general practical because it is key-dependent, and pitch contour 
requires too many query notes to be useful” (p. 55). These users might also not know which tune type to 
provide for the initial query if they first heard a tune in a session context, live performance, or from a 
recording. 


1.4. Title. 


Irish traditional tunes can have Anglicized or Irish titles. Baeza-Yates and Ribeiro-Neto (1999) note that 
problems with relevance occur when terms are misspelled, during cross-language information retrieval, and 
with a vocabulary mismatch between system and user. An example of this is the jig Hag with the Money, 
known also by its Irish song titles as Si Do Mhamó Í (She Is Your Granny) or Cailleach An Airgead (The 
Hag With Money). Irish Gaelic can be misspelled or provided without the necessary accents by users not 
fluent in the language. Even if users can locate the correct tune, if they are not Irish speakers, they may 
not know which version of the Irish title to use among all of the versions submitted. 

Tunes might also have titles that use vernacular terms, such as “Peeler” for policeman, “Tinker” 
for tinsmith (and traveling musician, usually), or even “Aisy” as vernacular for “Easy” and “Ha’Penny” for 
“Halfpenny.” Users searching for “My Mind Will Ne’er be Aisy” may not find the tune they are seeking, 
submitted as “My Mind Will Never Be Easy.” Searching for pieces of the title instead, such as “My Mind 
Will” may yield irrelevant results with the word “Mind” in it, such as the jig “Round the House and Mind 
the Dresser.” TheSession.org’s algorithm does not use the sequence of search terms “My Mind Will” to find 
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a match with the first three words “My Mind Will,” it simply searches for tunes containing any of those 
words. 

Nameless tunes can be a problem too, usually submitted by a novice user who doesn’t have a large 
tune repertoire or by a user transcribing from an album. A practice among some bands — starting with The 
Chieftains and continuing today with some bands such as Lúnasa — is to give a title like “The Dingle Set” 
to a medley of several tunes on an album, making it more difficult to locate any of the tunes individually 
by title. Gan Ainm (without name) is traditionally used when musicians either cannot think of the name 
or have never learned a name for a tune. Users unfamiliar with this practice may wonder why so many tunes 
are called Gan Ainm, or they may see the name on a recording track and attempt to locate it using this 
phrase as the tune title. Using the name of the person from which the tune was learned or associated is 
another common practice for naming tunes when musicians do not have a name for it. 

Additional problems with Irish tune names and MIR involve the use of identical or nearly identical 
names for different tunes. This is especially true with polkas and slides, as they are named either after a 
musician or after towns or geographic areas within the Sliabh Luachra region in Ireland, meaning there are 
multiple tunes using the same or very similar names: Dennis Murphy’s, Ballydesmond #1 and Ballydesmond 
#2, just to name a few. 

The problem of names that apply to more than one tune and/or tune type also occurs outside of 
polkas and slides. Some tune names apply to two tunes that are of different tune types, or have different 
keys/modes. For example, “The Boys of Ballisodare” is a G Major reel, whereas “The Boys of Ballisadare” 
is a G Major hop jig. The melodies are not connected, however the two spellings are frequently confused 
and mis-assigned to the other. Other examples are tunes of the same type, but differing in key/mode, 
melody, and number of parts. Examples of this include two jigs that both go by the name “King of the 
Pipers,” two jigs called “Pipe on the Hob”— one in A Dorian and the other in D Mixolydian — and the two 
jigs named “The Gold Ring” — one with four parts in D Major and one with six parts in G Major. 

For novice users, this traditional practice of assigning similar or identical names for different tunes 
makes MIR challenging. Also, given that musicians frequently discover new tunes from one another at in- 
person sessions, users attempting to find and learn a tune from a session may not remember that particular 
tune’s key/mode, or number of parts. With only a memory of the tune and the title given by the musician 
playing it, the user is faced with a dilemma over which “Gold Ring” is actually the one relevant to his or 
her query. 


1.5. Meter. 


Adding to this naming challenge is that some tunes are derived from a common melody and are manifest 
in several types of tunes. This is common with slow air melodies that are turned into either hornpipes, set 
dances, or jigs. An example is “The Blackbird” slow air, hornpipe, and set dance derived from the same 
melody. Other derivatives are from the old harp repertoire of the 17 and 18" centuries. The harp piece 
“Molly McAlpin” turned into the hornpipe “Poll Ha’penny” or “Paul Halfpenny.” A user would need to 
query each individual tune type and would have to know that they relate to one another. Unfortunately, 
there is not a way within TheSession.org’s system to link the melodic derivatives together except by using 
the comments section. 

Slow airs and harp pieces, such as those by O’Carolan, along with marches are all tune types that 
cannot be classified by meter/time signature like other tune types. Reel, jigs, slip jigs, and hornpipes are 
all written in the same meter across the tune type, however marches are either in 2/4 or 6/8, harp pieces 
are in 4/4 or 3/4, and slow airs are unmetered. Because slow airs are unmetered, this makes them difficult 
to transcribe and represent within a metered framework such as ABC notation. Users have submitted harp 
pieces, marches, and slow airs under the incorrect tune type in order to represent them within a metered 
framework they feel is appropriate. All clarifications as to the actual tune type—march, air, harp piece- are 
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contained within the comments section, and, because comments cannot be queried, users might never find 
a harp piece among the reels section without knowing the tune title. 

Submitting airs, harp pieces, and marches are only one way to confuse users seeking tunes by meter, 
key/mode, title, or via ABC string search. Some tune types can be transcribed with various fractional note 
values, such as polkas, represented both by quarter and eighth notes or by eighth and sixteenth notes. 
Sometimes users who are not adept in transcription will use even smaller fractional note values like sixteenth 
and thirty-second notes, rendering an ABC string search useless. If the ABC-string search could compensate 


for differences in fractional note values, key/mode, and octave, it would be more valuable to users. 


1.6. Ornamentation. 


Other transcription problems come from whether or not the user submitting the tune chose to do so with 
ornamentation. Irish music can be ornamented differently according to instrument type, and a user may 
choose to submit a transcription suited to one instrument in particular. Any presence of ornamentation 
within a tune transcription makes searching by ABC string very problematic, and requires a high level of 
knowledge from the user as to how to formulate ABC strings that contain similar figures or with different 
ornaments or note durations. Idiomatic transcriptions-those transcriptions suited to a particular 
instrument-can vary widely in ornamentation types, melodic figures, and even notes of the tune (Duggan 
et al., 2009). Idiomatic transcriptions base the notation and ornamentation on what suits a particular 
instrument’s strengths and limitations, as well as generally-accepted performance practices. For example, 
accordions use treble ornaments in place of rolls as on the fiddle. Also, wind instruments can roll on every 
note except the lowest D, whereas fiddles cannot roll on open strings (EADG). A fiddler may search for a 
jig using the ABC string “ABA GED DED GED” instead of “~A3 GED ~D3 GED,” but if the tune was 
transcribed and submitted by a pipes or flute player using the latter example, the query will yield no results. 

Hornpipes are a tune type that may be transcribed with straight eighth notes, or as dotted eighth 
and sixteenth notes to imitate the rhythmic swing in performance. Hornpipes are played differently than 
they are notated, but some users submit hornpipe transcriptions that contain ornamentation as well. 
Experienced players will add ornaments without needing them represented on paper, so when a tune is 
submitted with embellishments, it complicates retrieval by obscuring the basic tune melody. Phrases that 
sound equivalent to a traditional player might be notated in a number of ways, with triplets, cuts, rolls, or 


melodic variation to complicate the process. 


1.7. Key or mode. 
The previous examples demonstrate the limitations of using ABC-string searches with any number of 
combinations of notes, fractional note values, and various forms of ornamentation that may be present in 
tune transcriptions. Differences in key/mode are more commonplace in the modern Irish tradition, further 
complicating information seeking and retrieval. Some bands like Dervish and Lúnasa tune instruments up 
to Eb instead of the traditional D Major. Certain performers, fiddlers John Carty and Martin Hayes being 
notable examples, perform personalized versions of tunes that are sometimes transposed to a different key 
from the original tune. Performers may also record tunes played on instruments tuned in various keys of 
concertinas, accordions, and whistles. Fiddlers in the past and in the present continue to make use of cross- 
tuning — which changes the standard tuning of the instrument to one designed to give a drone-like effect — 
as well as tuning the fiddle strings down incrementally to give a more viola-like sound. Uilleann pipes also 
come in lower keys, such as low pipes in B Major, instead of the typical D Major. Transcriptions made from 
recordings by atypical key instruments means the identical melodic lines in other keys would not match the 
transcription unless a system could transpose melodies and match the melodic line shape. 

Users may submit tunes with a key or mode based upon a recording using an atypical instrument 
tuning or a performer’s personalized version in another key/mode from the original. This scenario would 
apply to the user who has perfect pitch and transcribes a tune based upon the alternate key/mode, not 
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taking the instrument into account. Even if the user doesn’t have perfect pitch but simply compares the 
recorded sound to his or her instrument, he or she would discover the key used in the recording. A tune 
played on the low pipes, for example, might be fingered in D Mixolydian but would sound in B Mixolydian— 
an extremely rare mode for Irish traditional music. 

Some tunes are traditionally played in two different modes, usually Ionian (Major) and Mixolydian 
modes. The slip jig “Gusty’s Frolics” is played both in D Mixolydian, using C naturals, and in D Major, 
using C sharps. Sometimes a performer mixes the two modes by switching between a C natural and C sharp, 
what O'Canainn terms note “inflection” (1978, p. 30). Tunes can also have multiple keys/modes for different 
parts, called “multi-modal” tunes. An example of a multi-modal tune is one with an A part in G Major and 
B part in D Mixolydian. For multi-modal tunes, it might be difficult for a user to know which key/mode to 
select when formulating a search query. Likewise, tunes that appear in multiple modes, like the “Gusty’s 
Frolics” example, will not be retrieved if the tune was submitted in D Mixolydian and the user selects D 
Major for his or her slip jig search. 


1.8. Relevance feedback. 


Ideally, the MIR system would also employ some form of relevance feedback. Saracevic (1975) describes 
relevance as “a measure of the effectiveness of a contact between a source and a destination in a 
communication process” (p. 325). This “communication process” is one between the user (source) and the 
system’s contents (destination), where the user establishes what is relevant by providing feedback to the 
system. TheSession.org’s MIR system cannot perform well enough without much effort and input from the 
user, however no formal type of relevance feedback is employed. Users need to interact with the system in 
order to allow this communication process to occur, otherwise the MIR system might not be very effective 
at performing its function. 

As Saracevic (1975) alludes, employing relevance feedback requires a multi-step process involving 
explicit feedback from the user to further refine queries. The relevance feedback process evolves as the user 
interacts with the MIR system, and in some cases the user finds that his or her information need evolves 
during contact with the items retrieved by the system (Baeza-Yates & Ribeiro-Neto, 1999, p. 178). For 
instance, a user searching for what he or she thinks is a jig may input an ABC search string that pulls up 
only slip jigs and reels. The user may examine these results to determine the relevant result was, in fact, a 
slip jig and not a jig. In this way, users might also have a stronger sense of their information need after it 
has been affected by the retrieval process, contributing to a change in the user’s concept of what is relevant 
(Froelich, 1994). 


1.9. Current and Future Research 


Understanding more about how users interact with MIR systems, especially in the area of non-Western 
common practice music, is essential for their future development. In 2003, Downie listed non-Western MIR 
as one of ten important areas for future development in the field (p. 328-829). A decade later, the field is 
still in beginning stages of exploration of using non-Western music as the subject for MIR research. Duggan’s 
work in MIR and Irish traditional music will likely benefit those working with other traditional musics, as 
issues of ornamentation, rubato, transposition, breathing on wind instruments, and octave normalization 
are not unique to Irish music. Other traditional musics employ non-Western common practice scales, tuning 
systems, and unique methods of improvisation and will require additional MIR, tools as well (Uitdenbogerd 
& Zobel, 2004; Tzanetakis et. al, 2007). As with Irish music, many non-Western common practice musics 
resist precise and standardized methods of transcription, making them similarly problematic to query. 
Flexible and intelligent search capabilities are also needed to accommodate diverse backgrounds of 
users and the somewhat unpredictable nature of representing and transcribing Irish traditional music. 
Played or sung queries for known-item searching, ABC-string searches, natural language queries, musical 
feature queries, and even those based on traditional musical metadata could be used in combination to make 
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searches more robust and more accessible to users of all backgrounds seeking many different kinds of music 
information. Haus, Longari, and Pollastri (2004) describe a similar — though more Western Classically aimed 
— system as a “cross score/audio integrated database environment in which one can find any kind of 
information by singing, playing, and writing score excerpts” (p. 1046). While a notation-centered retrieval 
system falls short with orally-centered traditions like Irish music, the varied approach to user query input 
is helpful. 

Researchers like Lidy et al. (2010) are beginning to explore whether “current MIR approaches can 
also be applied to collections of non-Western and, in particular, ethnic music with completely different 
characteristics and requirements” (p. 1032). Bainbridge, Dewsnip, and Witten (2005) emphasize the 
importance of approximation and flexibility in user-centered MIR systems, which has particular applicability 
to the challenges of Irish traditional music and other world musics: “Approximate matching is a necessity 
for melody retrieval. There are many opportunities for errors to be introduced into queries: poorly 
remembered tunes, pitch and duration errors (if queries are hummed or sung), and errors introduced by the 
transcription process (again, if queries are hummed or sung). These errors have dire consequences on the 
performance of the exact algorithms, and this makes exact algorithms unsuitable for melody retrieval. Also, 
approximate algorithms allow users to find tunes that are similar to a query” (p. 55). The shortcomings of 
sung or hummed queries would likely extend to those played on instruments, as the user's proficiency and 
musical background are also factors in an instrument performance. 

TunePal, a MIR tool for Irish traditional musicians, is a recent development designed to query Irish 
music databases such as TheSession.org using sung or played queries on traditional instruments (Duggan, 
et. al, 2008). TunePal derives from the earlier MATT2, or Machine Annotation of Traditional Tunes, the 
“first attempt to adapt MIR to the specific characteristics of traditional Irish dance music and to support 
queries played on traditional instruments” (Duggan et al., 2008, p. 27). Audio input from an Irish traditional 
musician playing his or her instrument is transcribed into the ABC notation language, and then that 
transcription is matched among tune corpora in several databases. TunePal accounts for rubato (variation 
in performance speed), breathing for wind players, and employs octave normalization, along with an an 
additional step to remove ornamentation from the player’s performance of the tune query to improve melodic 
contour representation abilities (Duggan et al., 2008). 

There are limitations to TunePal’s abilities to match ABC transcriptions of played tunes with ABC- 
notated tunes contained in databases like TheSession.org. Because TunePal translates the user's audio query 
into text-based notation for matching purposes, it falls prey to the previously discussed shortcomings when 
Irish music is transcribed. TunePal cannot transpose ABC transcriptions and search for matches in other 
keys/modes. Altered keys and multi-modal tunes render an ABC string search useless, as the ABC matches 
exact pitches and rhythm, not by melodic contour or relative fractional note values, and it cannot transpose 
to other keys. Users would have to be expert enough to guess other possible keys, and transcribe melodic 
ABC fragments accordingly. A more useful MIR system could account for differences in key/mode, and 
transpose strings of ABC notation to locate relevant results. 

In addition, Duggan's tools cannot process multiple tunes in quick sequence played as a medley 
(Duggan et al., 2008). Irish traditional musicians almost always combine tunes into medleys of two or more, 
so this limits the sources from which TunePal can match queried versions played by traditional musicians. 
If both systems could detect and differentiate between tunes in a medley, future developers could use large 
numbers of commercial and archival recordings of tunes as data to enhance matching capabilities and 
account for the diversity and uniqueness present in Irish traditional music performance. 

An ideal MIR system for Irish traditional music might incorporate the organization principles of 
linked open data to connect a diverse array of information objects from cultural institutions, individuals, 
scholars, and music practitioners. This information might include: image representations of sheet music, 
ABC notation and other transcriptions, metadata of tune recordings, sound files in MIDI, archival and 
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modern recordings, anecdotal information of musicians strongly associated with specific tunes, geographic 
information, regional style information, alternate titles, historical context, known composers, melodic 
variants and/or derivatives (such as The Blackbird example mentioned earlier), and other contextual 
information. Linked data may provide the means, but those involved with the construction of such 


meaningful information connections should include music practitioners in addition to more scholarly sources. 


2 Conclusion 


TheSession.org is an example of an environment where information, description, and organization is user- 
created, like a folksonomy. As with a folksonomy, the diversity of users’ submissions and comments enriches 
understanding of the information at hand, yet this makes it difficult to organize effectively for retrieval. All 
user contributions are valuable because they can all be potentially informative, which makes everything 
within TheSession.org “information” according to Buckland (1991). Irish traditional music resists the idea 
of a single, authoritative information object. Because any item of text, hyperlink, or other type of 
contribution by one practitioner of traditional music can be considered equally informative from the next, 
it is essential that these be accessible to users via flexible search techniques and integrated query results. 

While the technicalities of developing music information retrieval systems flexible enough to meet 
the demands and uses of user communities like traditional music practitioners is enough of a challenge, the 
greater challenges are in the areas of access and collaboration. Large stores of recordings and other 
traditional music information lie in archives and personal collections across the globe (Seeger, 1996; 
Jorgensen, 2004; Proutskova, 2007) — an access challenge outside the scope of this paper, though noteworthy 
all the same. Music information retrieval researchers are also hindered in their development of new 
techniques and tools by availability of audio data sets, either due to copyright restrictions or inaccessibility 
within archives or other cultural institutions. 

Van Kranenburg et. al (2010) argues that Computational Ethnomusicology would allow an 
unlocking of large musical data sets by extracting and processing relevant melodic and musical feature 
information. While access remains a looming concern, the description and processing of such information 
might not depend as critically on automating such processes as much as on collaboration and involvement 
with practitioner communities and those that study them. MIR systems suited to the particular instabilities 
inherent in non-Western music will develop through: an increased interest in this area by MIR researchers 
and system developers, increased collaboration between the MIR research community and cultural heritage 
organizations; and collaboration between the MIR research community and ethnomusicologists, folklorists, 
and anthropologists working directly with traditional music practitioners; and particularly between all 
entities and the music practitioners themselves. 

Music is more than the sum of its various representations — it is both culture and information. 
There are numerous traditions embedded in musical practice and understanding that practitioners bear. 
Without harnessing this specialized knowledge, the full extent of musical information latent in every audio 
file remains locked to those without insider perspective. Future research and developments in non-Western 
common practice music information retrieval depend upon better understanding the information needs and 
uses of traditional music practitioners, as well as better understanding representation and organization 
problems specific to those musics. When “everything” is information — as with Irish traditional music — 
access to, description of, and organization of world music by, or in collaboration with, practitioners becomes 
a defining factor in how the music information retrieval field will advance with such music. 
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Abstract 

Libraries have long maintained strong protections for patron privacy and intellectual freedom. However, 
the increasing prevalence of sophisticated surveillance systems in public libraries potentially threatens 
these core library commitments. This paper presents the findings of a qualitative case study examining 
why four libraries in the US and the UK installed video surveillance and how they manage these systems 
to balance safety and privacy. We examine the experience of these libraries, including one that later 
reversed course and completely removed all of its previously installed systems. We find that the libraries 
who install surveillance initially do so as either a response to specific incidents of crime or as part of the 
design of new buildings. Libraries maintain varying policies about whether video footage is protected as 
part of patron records, about dealing with law enforcement requests for footage, and whether patrons 
ought to maintain any expectation of privacy while inside libraries. 
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1 Introduction 


Video surveillance is an ever-increasing issue in modern society. More and more security cameras and other 
surveillance technologies such as drones, biometric recording, facial recognition and even genetic profiling 
tools are becoming prevalent in our everyday lives. As a result there is a very real struggle emerging between 
maintaining personal privacy and ensuring a certain level of security, as well as upholding the law. Current 
large-scale security systems with networked video cameras, expansive control rooms, roaming security 
guards, and the incorporation of other cutting edge technologies, can be compared to the Panopticon designs 
of the English social theorist Jeremy Bentham (Mike, 1990; Norris, 2003). The possibility that the modern 
public library, the with profession’s long held commitment to privacy and intellectual freedom, could be 
compared to Bentham’s panoptic prison in which the few — as largely unobservable observers — watch the 
many in an act of power and domination, is striking. If video surveillance has the potential to change power 
relationships between the state and its citizens (Forcese & Freeman, 2011; Webster, 1998) and negatively 
affect civil liberties, its implementation and management in the public library setting should be studied 
rigorously. Surprisingly, there is scant literature addressing the subject, and research that does largely 
ignores the important civil liberties issues. 

Bentham’s panoptic vision involved a centralized ability of those in power to monitor large numbers 
of others, who had no ability to watch back — and often had no idea when they were actually being watched. 
It is designed, according to Bentham, so as to effect “a new mode of obtaining power of mind over mind, 
in a quantity hitherto without example.” (Lyon, 2006) The design became the basis of Michel Foucault’s 
theory of Panopticism, which examines the way discipline, power and punishment work in modern society 
(Foucault, 1977). Foucault demonstrated that “there is a reciprocal relationship between power and space” 
(Koskela, 2000), and argued that a city can be seen as a “laboratory of power” (Foucault, 1977; Koskela, 
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2000). Many surveillance theorists have moved beyond a religious adherence to Foucault’s panopticism, but 
his theories continue to underlie much surveillance discourse (Lyon, 1994) and many have found parallels 
between surveillance-filled cities and Foucault’s ideas (Ainley, 1998; Cohen, 1985; Fyfe & Bannister, 1996; 
Hannah, 1997; Herbert, 1996; Koskela, 2000; Soja, 1996; Newell & Randall, 2013). Recent research also 
suggests that the implementation of video surveillance may lead to urban “purification” and discriminatory 
“social sorting” (Lomell, 2002; Stalder & Lyon, 2003). Other research has found that video surveillance 
itself is not effective at preventing crime, and that the function of video surveillance, and the rationale 
behind its further implementation, has shifted away from crime prevention (an oft cited rationale for its 
initial implementation) to matters of national security and community or workplace safety, a form of 
surveillance creep (B. C. Newell & Randall, 2013; Stalder & Lyon, 2003; Webster, 2009). 

The Panopticon necessitated a rigid architectural design, so as to achieve the goal of watching 
without being seen to be watching. However, today this panoptic goal can be achieved in almost any 
building, thanks to the prevalence of surveillance technologies. It is therefore very easy to draw a contrast 
between Bentham’s designs and many modern public institutions that have adopted video surveillance 
technologies, such as public libraries. 

This paper presents the findings of a study that we hope can begin to fill this void. We present 
original research findings from a study of the surveillance practices of four large libraries, one in the United 
Kingdom and three in the United States, in an effort to determine what factors and considerations have 
driven these libraries to start utilizing surveillance technologies and what repercussions library 
administrations have felt as a result of the installation of those technologies. This paper builds on and 
extends our earlier research in this area (B. C. Newell & Randall, 2013). 


2 Background and Prior Research 


2.1 Video Surveillance in Public Libraries 


The International Federation of Library Associations and Institutions statement on libraries and intellectual 


freedom reads: 


“Library users should have the right to personal privacy and anonymity. Librarians and other 
library staff should not disclose the identity of users or the materials they use to a third party” 
(IFLA, 1999). 


In the United States, the American Library Association (ALA) annually celebrates its “Choose Privacy 
Week”, and the ALA’s Office for Intellectual Freedom has been actively promoting the recognition of 
privacy in the public library setting for some time. The ALA’s position in regard to video surveillance is 
particularly enlightening: 


“high-resolution surveillance equipment is capable of recording patron reading and viewing habits 
in ways that are as revealing as the written circulation records libraries routinely protect.... Since 
any such personal information is sensitive and has the potential to be used inappropriately in the 
wrong hands, gathering surveillance data has serious implications for library management” 
(American Library Association, 2006) (emphasis added). 


One library security report asserts that states best practice for libraries should be to implement physical 
security measures and perform risk assessments before moving to the installation of video surveillance 
systems, however the report goes on to state “[video surveillance] systems are quickly becoming one of the 
most important and economical security and safety tools available to libraries” (McComb, 2004), but does 
not even mention privacy considerations. Reports also suggest that video surveillance should only be 
employed to “provide a safe and secure facility for library employees, library resources and equipment, and 
library patrons” (McComb, 2004). 
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In the United Kingdom, the Chartered Institute of Library and Information Professionals’ (CILIP) 
guidelines on privacy astutely state that the use of video surveillance in libraries “raises the question of 
where the balance of security and privacy should lie” (CILIP, 2011). The guidelines also state a serious 
concern that cameras are used as an “easy solution” by library administrators who don’t take privacy into 
consideration and recommend that before cameras are installed at any library location that it should be: 


“clearly established that CCTV is a solution to the problem and that there are not other effective 
solutions with less impact on privacy” (CILIP, 2011). 


In 2008 CILIP issued a survey of police, surveillance and libraries within the UK in response to claims of 
increasing police requests for information on library users. The survey found that 75% of libraries responding 
received a request for patron information from a UK police force or the Security Service (aka MI5). 66% of 
libraries reported having a formal policy to deal with these requests, with an additional 13% only having a 
policy that addressed the Data Protection Act (1998). Interestingly only 9% of respondents felt that they 
were the victims of police engaging in “fishing” activities — looking for patron records without reasonable 
suspicion — and one library reported having a member of the Metropolitan Police Service’s Special Branch 
approaching staff members and asking for them to report patrons visiting extremist websites directly to 
him. (CILIP, 2008). Indeed, it is surprising that little empirical research has been conducted to understand 
the role of video surveillance in the public library setting (B. C. Newell & Randall, 2013). 


2.2 Video Surveillance and Crime Reduction 


Studies analyzing the impact of cameras on crime rates have typically involved systems installed in publicly 
accessible urban areas such as city streets or shopping centers. Research has been conducted in a variety of 
locations, including the United Kingdom (Gill & Spriggs, 2005; Welsh & Farrington, 2002, 2004a), United 
States (Cameron, Kolodinski, May, & Wiilliams, 2008; King, Mulligan, & Raphael, 2008; La Vigne & Lowry, 
2011; Schlosberg & Ozer, 2007), and Europe (Lomell, 2002; Sætnan, Lomell, & Wiecek, 2002). Despite that 
fact that crime prevention has typically been the preferred policy basis for governmental and private 
installation of cameras (Webster, 1998), these studies generally indicate that video cameras have little or 
no statistical effect on incidents of crime (Biale, 2008; Webster, 2009; Welsh & Farrington, 2004b). Webster 
argues that video surveillance systems do not prevent crime and that the evidence base does not support 
the continued expansion and use of video surveillance on the basis of crime prevention alone (Webster, 
2009). Webster and others have also argued that the purposes and uses of video surveillance systems have 
been shifting over time, becoming a normal and widely accepted aspect of modern society, allowing unabated 
diffusion of video surveillance systems regardless of the evidence that their oft-promised crime prevention 
capabilities may be mythical in actual practice, and despite serious implications for the civil liberties of the 
local citizens (Webster, 2009). The theory of “surveillance creep” is premised on the idea that “the policy 
focus of video surveillance has shifted as the technology has diffused, from crime prevention, to community 
safety and now also to national security” (Lyon, 1994; Webster, 2009). 


2.3 Privacy and the Legal Basis for Governmental Surveillance 


A number of federal court decisions in the United States have reaffirmed the right of government to monitor 
publicly owned spaces, as long the surveillance does not capture areas where a reasonable expectation of 
privacy (measured both subjectively and objectively) exists (United States v. Jones, 2012; United States v. 
Knotts, 1983). In some cases, however, federal courts found that video surveillance violated Fourth 
Amendment guarantees against unreasonable searches in middle school and police station locker rooms 
(Bernhard v. City of Ontario, 2008; Brannum v. Overton County School Board, 2008), as well as a shared 
physical education teachers office adjacent to a school locker room (Doe v. Dearborn Public Schools, 2008), 
because the respective plaintiffs maintained reasonable expectations of privacy in those spaces. However, 
the stronger legal basis for governmental surveillance in the public areas of a library, compared to the 
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obviously more private nature of locker rooms and personal office space, is based on the premise that 
individuals do not maintain any objective expectation of privacy in their conduct in these public spaces, 
and that these surveillance systems represent a valid use of state power to protect public safety (Nieto, 
1997). As a result, video surveillance in these areas is generally permissible (Carson, 2010; Nieto, 1997; Sher, 
1996). 

Despite the fairly clear legal basis for video surveillance in libraries in the United States, legal 
scholars have also noted the potential chilling effects that such systems may have on speech in public spaces 
(Nieto, 1997; Sher, 1996; Slobogin, 2002). Some commentators have argued that, because video surveillance 
raises the problem of the “unobservable observer”, where the watched do not — or cannot — know who is 
watching or for what purpose, national or local policy ought to require more overt surveillance practices, 
public disclosure, and independent oversight of control rooms (Goold, 2002). 

The United Kingdom has a similar view of surveillance in public spaces as the United States. During 
a 2008 House of Lords debate Lord Bassam of Brighton - spokesman for the Home Office and Attorney 
General — made the following statement: 


“There are no legal restrictions on photography in a public place and no presumption of privacy 
for individuals in a public place. There are no current plans to review this policy.” (House of Lords 
Debate, July 16 2008) 


In the UK the use of data collected by any video surveillance system is covered entirely by a single piece of 
legislation, the Data Protection Act (1998). The Data Protection Act’s remit is to cover any data held on 
an identifiable living person; this includes video surveillance in public places. The law was passed to bring 
the UK in line with the European EU Data Protection Directive (1995), but, unlike the EU Directive, the 
law itself makes no explicit reference to privacy. Under the Data Protection Act anyone using a CCTV 
system for any purpose other than protection of a private residence must register with the UK Information 
Commissioner’s Office (ICO). The role of the ICO is to make sure that institutions and business comply 
with the Data Protection Act and to protect the rights that individuals have under the act (CCTV 
Regulations, ICO, 2008). 

Interestingly, unlike US law, which requires a court order in order for law enforcement to gain 
access to personal information held by a public body such as a library, under Section 29 of the Data 
Protection Act UK law enforcement agencies, can gain access to this information freely, without the 
involvement of a judge (Data Protection Act, HMSO, 1998). Specifically Section 29 states that personal 
data held for the purposes of the “prevention or detection of crime” or the “apprehension or prosecution of 
offenders” is exempt from the protections of the act. However, this would simultaneously appear to 
contradict elements of the UK Human Rights Act (1998), which upholds a British Subject’s the right to 
privacy, even in public spaces (Gras, 2002). 

Following on from this prior research we will investigate a number of libraries to discover their 
surveillance practices, policies and interactions with local law enforcement in relation to cameras in order 
to answer the following research questions: 


RQ 1: What factors do libraries take into account when they design, implement, or change official 
policies related to on-site surveillance practices? 

RQ 2: How do libraries balance the privacy of their patrons with the safety and security of their 
employees, patrons and facilities? 
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3 Methodology 


3.1 Locations 


We investigated the surveillance activities of a number of libraries in the United States (n=3) and the 
United Kingdom (n=1). In the U.S., we selected three libraries from two different states. Two of these were 
smaller urban libraries in a state in the American Southwest and one was a large county-wide library system 
in the Pacific Northwest. Each of these three libraries was chosen because of their proximity to the 
researchers at the time of the research, their willingness to participate, and the fact that they currently (or 
had previously) used video surveillance as a part of their security strategy. We also limited our selection to 
libraries that were members of the Urban Libraries Council. In the U.K., on the other hand, we investigated 
the video surveillance practices at a large special collections library, which was chosen because of its unique 
history of utilizing surveillance technologies, which we thought would provide an interesting contrast to the 


experience of libraries in the U.S. 


Collection Size / 


ID Location Branches Circulation Service Pop. 
A US - Southern 1 180,000 / 1.4M 116,000 

B US - Southern 3 240,000 / 1.4M 113,000 

C US - Pacific North West 46 4.1M / 22.4M 2,000,000 

D United Kingdom 2 150M / NA 63,000,000 


Table 1: Research Locations - Size, Circulation and Service Population 


3.2 Procedure 


At each of the four libraries, the researchers conducted semi-structured interviews with library 
administrators and analyzed documents and emails available publicly or by request under local freedom of 
information laws. We used these methods to gather as much detailed information as possible about each 
library system’s video surveillance policies and the reasons behind the implementation and changes made 
to those policies in succeeding years. All participants were fully aware of the library policies concerning 
video surveillance and were actively involved in the decision making processes involved in their operation, 
installation, and, in one case, the ultimate removal of the cameras. Following each interview, we requested 
additional library documents and emails related to the surveillance cameras under local freedom of 
information laws. We analyzed these documents thematically, comparing the data with our interview 
transcripts in an effort to help ensure trustworthiness and validity, and to attempt to triangulate our 


findings and conclusions. 


4 Findings 
4.1 United States 


4.1.1 Library A 


Both of the smaller urban libraries from the American Southwest (libraries “A” and “B”, respectively) had 
security cameras installed at the time this research was conducted. Library A has only installed and 
maintained one single camera, located in a stairwell. The camera was installed to alleviate safety concerns 
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of library administration because the stairwell was otherwise an unmonitored area away from public (and 
staff) view. However, the feed has never produced any needed security footage as little unwanted behavior 
has reportedly occurred on the upper landing of the stairwell (the area under surveillance). The feed from 
the camera is displayed at a reference desk, and is visible to staff and patrons — although the screen is small 
and the display was not functioning properly when the researchers were visiting the library. The camera 
itself was installed as part of the overall plan and construction of the building, and not in response to any 
specific incident. 

Library A also shares its building with City Hall. Because of the dual purpose nature of the building 
space, the city utilizes its own security cameras in the building’s atrium (which serves as the main entrance 
to both the library and City Hall); these cameras overlook the library’s entrance and can view portions of 
the ground floor area of the library near the entrance. Despite never using its own solitary camera footage 
for security purposes, Library A did report several incidents involving footage from the atrium cameras. 
For instance, the library administrator stated that footage from the atrium cameras had been utilized by 
the library to investigate and identify an individual suspected in “a chronic theft problem” and due to 
allegations made by patrons against other patrons or library staff members. In these cases, law enforcement 
requested footage from City Hall directly, although the library was made aware of the investigations. On 
other occasions, the police had requested atrium footage from library staff, who directed them to City Hall. 
Because the library does not control the atrium cameras, it is likely that other requests for footage have 
been made directly to the city without the library’s knowledge. 

The library does not have an official written policy dealing with the installation or use of video 
cameras, but the administrator noted that the city’s general push towards increasing levels of surveillance 
and automatic police access to many city camera feeds might cause the library to seriously discuss the 
implementation of a policy in the future. This potential need would be especially pronounced, according to 
the library administrator, because any future location would be away from “the umbrella of City Hall,” and 
the ban on concealed weapons afforded by the library’s current location in the shared building. The 
administrator stated, “I suspect [we are] going to move more toward surveillance than away from it, just 
simply because of the active shooter issues, as [this] is a concealed handgun state,” and the library has been 
actively preparing for active shooter incidents. Additionally, the administrator noted that there was no 
explicit policy detailing who could request footage or whether footage was part of the patron record, adding 
that “we need one.” 

Although the stairwell camera was primarily installed to ensure safety in the stairwell, the 
administrator stated that, “we default to privacy” when considering the impact of library security policies 
and practices. However, when asked about the proper role of video surveillance in the public library setting, 
a library administrator stated, “I am at war with myself” due to the competing safety and privacy concerns. 
The library also does not have any signs posted to inform patrons or other visitors about the presence of 
the stairwell camera, and no signs exist in the atrium area controlled by the city government either. This 
decision was purposeful, as the library was concerned that behavioral signage (also including “no smoking” 
signs) would be “counter to... how we want our space to feel.” 

Library A has also never had any issues with public backlash against the existing cameras in either 
the atrium or the stairwell. The administrator noted that, “what I’m observing from our patrons is that 
they care less about their privacy than we do.” Additionally, as the library has begun talking with patrons 
about the option to have borrowing histories and search patterns saved, patrons have largely preferred the 
convenience over any risks to their own privacy. As a result, the library is concerned about “responding to 
[its] customers appropriately” as it considers new electronic services — so-called library 2.0 services. 


4.1.2 Library B 


Library B, the second library from the American Southwest, has a network of cameras in each of its branches 
which they have introduced over the last 4 years. The cameras cover the circulations desks, the entrance to 
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toilet areas, and the exterior of the buildings. The cameras were installed on an as-needed basis, beginning 
with the main branch and then spreading to the other two locations, primarily to protect staff and patron 
safety. The library does not have a written video surveillance policy. They view the cameras to be a tool 
for “safety and security” and cooperate fully with local law enforcement. 

The library has had several instances where they have used the cameras to prosecute crimes or 
identify persons who have been involved in disturbances in the library. In each case they voluntarily released 
the needed footage to local law enforcement and did not require a court-order as they do not consider the 
camera footage to be a part of the patron record. However, the library will not release footage to members 
of the public without law enforcement involvement. In terms of the library relationship with local law 
enforcement, the administrator stated, “we love law enforcement; anytime they want to come in [to the 
library] is fine with me.” 

The library administrator noted that the cameras didn’t seem to create many privacy issues for 
patrons, and that the cameras at the circulation desks were positioned “so that they don’t catch titles, you 
can see the activity happening at the desk .. but it’s not clear enough, or close enough to see what it is 
exactly they [patrons] are doing.” In a public library, stated the administrator, “you don’t have an 
expectation of privacy.” The administrator also stated that libraries implement security camera systems to 
ensure safety, and that, “you have to work in [a library] to understand the reality of what it is to work in 
the public library... Pm all for privacy, but safety trumps.” At the time of the interview, several criminal 
prosecutions that relied on library video footage were underway in local courts, and despite the lack of 
conclusive information about whether the presence of cameras has deterred crime, the administrator noted 
that, “we certainly catch the thieves.” 

In addition to their active cameras, the library has also placed several dummy cameras in certain 
locations in direct response to theft or unwanted activity. As stated by the administrator, “occasionally we 
have people stealing the newspapers, and we have dummy cameras over the childrens’ [and] teens’ video 
and audio books because we had a rash of thefts.” The library makes use of signage to warn patrons they 
are being recorded, some of these are located next to the dummy cameras. Additionally, the library has 
installed mirrors in various places and provides access to live video feeds at the public services desks to 
allow library staff to monitor the library space for unwanted activity. Although these feeds are not actively 
monitored, and cannot be accessed by the general public, staff does have access on demand to view the 
feeds. In addition the feeds can be accessed by any employee on any library workstation using specially 
installed software which also allows several of the exterior cameras to be remotely controlled. Moving 
forward, the library administrator stated that “it would be in our best interests to have the entire library 
covered [with cameras]... for safety... and I think that it makes sense that we should have a security camera 
focused on the children’s areas specifically. 


4.1.3 Library C 


Library C is a large library system that includes both urban and rural library locations in the Pacific 
Northwest. Prior to May of 2011, 10 of the library’s 46 branch locations had camera systems installed. In 
May of that year, the administration simultaneously removed all of the cameras under their control. The 
library system no longer manages or operates any video surveillance cameras at any of its locations. 
However, one location, which maintained a video security system prior to its annexation into the county- 
wide library system, continues to have cameras installed on the exterior of its building, although these 
cameras are now run by the city government directly. Additionally, in early 2012, another city government 
independently installed cameras at a building that the library shares with that municipality’s City Hall. 
The cameras at all 10 branch locations were generally installed in response to repeated incidents of 
crime occurring in and around library buildings. Cameras were installed in the first branches in the late 
1990s, as branch managers responded to specific incidents of criminal activity or safety concerns. For 
instance, cameras were “used in some of the libraries before [staff] left the building to see if there was 
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anybody around the building, so [a staff member] could get to their car.” When the cameras were present, 
the library maintained a policy that all law enforcement requests for footage must be accompanied by a 
court order or subpoena. In our interview, the administrator noted, “We did give the video if it were after 
hours and there was no one in the parking lot... But if there was somebody in the parking lot, they could 
be using our WiFi... then we would still require the subpoena.” 

The library maintained this policy because their interpretation of the library records exemption to 
the state public records act held that video footage was part of the patron record — a position backed by a 
legal opinion from the library’s attorney. However, this policy became a point of contention with multiple 
law enforcement departments. As the administrator in charge of handling requests stated: 


“What [the police departments] would do is they would, you know, basically try to get the front 
line people to turn it over to them by making them feel bad that they weren’t helping them... solve 
this crime and so they put a lot of pressure on them.... It really is bullying behavior.” 


Ultimately, library legal counsel expressed some reservations about whether footage from the exterior 
cameras was actually exempt under the library records exemption to state public disclosure law, leading 
the administration to admit that it was not “totally comfortable that [their interpretation of the library 
records exemption] would be upheld in the courts.” 

The administration’s concern about the library’s use of the camera systems was an on-going issue, 
but the impetus for the decision to remove cameras came in March of 2011 when a conflict arose with a 
local police department after the library demanded that police obtain a court order before the library would 
turn over camera footage of an assault in the library parking lot. This particular situation became further 
aggravated when police finally obtained a court order for the footage a week later and publicly stated that 
they had apprehended the suspect, a known transient, within 15 minutes of an officer viewing the footage. 
Shortly after the March 2011 incident, the administration set up a team to conduct a “critical review of 
security cameras to gauge the impact and effectiveness of the cameras and whether they are appropriate to 
our mission of protecting patron privacy and confidentiality.” The administration also discussed the issue 
with their legal counsel and its library managers and conducted research into the effectiveness of video 
surveillance as a crime prevention tool. The administration announced its decision in a memo to library 
staff, which read, in part: 


“We cannot argue with the sentiment that cameras make some people feel safer.... However, the 
potential impact to our mission to provide equal and open access to the library with protection of 
privacy and advocacy of intellectual freedom are too great to continue to provide security cameras.” 


The library quickly removed all cameras under its control at its ten branches. Two branches continue to 
have cameras, but these are not owned, operated, or maintained by the library and do not primarily focus 
on the libraries. 


4.2 United Kingdom 


4.2.1 Library D 


Library D has one central location and an additional location used to house part of its collection, this second 
location has a small reading room with limited public access. The central library has an extensive network 
of cameras. The library was built with an analogue system in the late 90s and a constant program of 
upgrades and expansion has been taking place ever since. The cameras cover both internal and external 
locations within the library grounds and the entire system is controlled from a central control room, manned 
24/7, where feeds are only accessible to members of the library’s security team. However, there is one 
exception, feeds are shown to the public at the entrance to the library and the entrance to the reading 
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rooms. The library was designed with cameras in mind, following a philosophy that “the cameras will solve 
all of your problems.” 

The library views its use of the CCTV cameras in three main ways. Firstly the cameras are used 
to maintain the libraries external perimeters so that the security can monitor access to the library site, 
particularly at night, “It’s not an un-scalable perimeter, we haven’t got prison walls, but it just a 
demarcation, there is our fence and if you come across our fence then we’re going to ask you what you’re 
doing.” The exterior cameras are motion activated at night, to assist security in identifying sectors where 
an intruder may be trying to scale the fence. The second element to the CCTV cameras is to monitor the 
public areas of the library and observe what is happening in the library. Cameras are installed in all areas 
of the library with the exception of public restrooms. The highest quality cameras are within the reading 
rooms and are constantly monitored. Here they provide backup to the security guards when they go to an 
incident. “The aim being that if Pm questioning you about something there is a camera watching what’s 
going on so that we have got a record of who threw the first punch.” The third level is the CCTV use 
within the reading rooms inside the library. The library has a number of different types of reading rooms, 
from low security to high security, as well as a specific room for scientific journals. In the high security 
reading rooms the camera density is much higher and the recordings are kept for longer as the library is 
trying to hold a record of “what happened at each desk at a particular day”. The lower security reading 
rooms merely have area surveillance and the feeds aren’t as high quality. All feeds from all cameras are 
kept for a minimum of 31 days and a maximum of 1 year. Data is stored on site on secure servers, which 
only the security team have access to. 

Overall the library has a very limited level of active surveillance. The number of cameras and 
patrons far outnumbers the ability of the one or two on duty security officers to monitor everything. As a 
result the library views the cameras as “largely a historical record that can be used after something has 
been detected.” The security team’s main role in active surveillance is therefore to monitor the reading 
rooms for patrons breaking the code of conduct, for example using a pen in the reading rooms, which is 
strictly forbidden, and also to monitor exterior areas for thieves. The level of surveillance at the library is, 
as established, very high, even if the level of active surveillance is somewhat limited. Patrons who enter a 
reading room sign up to accept the term and conditions of entering that room, part of which is to accept 
being filmed by the cameras. The library considers there to be no expectation or right of privacy in the 
reading rooms in order for them to protect the integrity of the collection. However the library operates a 
strict Code of Practice that limits who is able to view footage, and how that footage should be handled, 
violation of that policy is a dismissible offence. Outside of the reading rooms, surveillance is much more 
cursory, security is handled mostly by roving guards. 

The library is fully compliant with the Data Protection Act and as a result requires no court order 
or warrant from Police or the Security Services when they request video surveillance footage. Police will fill 
out a form and sign for the footage under Section 29 of the act to gain access to any material they so wish. 
This Section 29 interaction is the library’s primary, and usually only, direct interaction with law 
enforcement, with the exception of when staff may be asked to provide written statements due an incident 
that occurred on the premises. Most incidents of crime that occur on the library grounds are related to theft 
(personal theft and bike theft) and dealt with by the library security staff themselves without police 
involvement. The library uses a system called Facewatch which allows the security staff to upload 
information about a crime directly to the police, including witness statements, photos, and CCTV footage, 
the system then generates a crime reference number for the individual, enabling them to claim on insurance, 
and in the case of the theft of a wallet it will cancel all the victims credit cards. Since the library was 
opened, minor theft has been its main security issue, along with several serious assaults outside of the 
library recorded on their cameras. Only one major incident related to the reading rooms and directly 
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involving the police has occurred. That incident involved a book dealer removing and stealing pages from 
rare books. 


5 Discussion 


Primary Purpose of Requires 
ID Camera Status Installation Reason 
Camera Warrant? 
Safety Concern at Building 
A Single Camera Specific Safety Issue Yes 
Construction 
B Multiple Cameras Reactionary to Specific Incidents Crime Prevention No 
None - All Staff Safety and 
C Reactionary to Specific Incidents Yes 
Removed Crime Prevention 
Safety & Preservation Concern at Preservation of 
D Multiple Cameras No 
Building Construction Collection 


Table 2: Research Locations - Findings Summary 


In this paper, we have been particularly focused on the role of video surveillance in a public library setting, 
and the findings of this study provide consistency with and, in a few places, stark contrast to, the expected 
increase in the adoption and implementation of modern surveillance technologies in our society. The libraries 
studied in this research implemented video surveillance systems primarily in response to either 1) staff 
safety concerns related to criminal activity occurring in or around library buildings or 2) general plans to 
construct new library buildings. In some cases, library staff ultimately utilized the cameras for various other 
purposes as well, including ensuring patron and employee safety and protecting library property. This 
finding is consistent with the idea of “surveillance creep”; that is, that the controllers of the surveillance 
systems begin to use the systems in ways not originally planned for or considered. 

Our research shows a clear variety in the interpretation of how video footage is treated by different 
library systems. Libraries A and C both erred more on the side of privacy, although Library C was very 
explicit about this in regards to video footage, library A’s one camera setup restricted their need for an 
established policy, although they indicated that the installation of more cameras would force them to head 
in that direction. Conversely Libraries B and D had a different view, for Library B this was down to the 
choice of the library system itself, deciding to fully cooperate with law enforcement requests for footage - 
although purposefully designing their camera setup to not capture patron borrowing details where possible. 
In the case of Library D their policies are enshrined in law, nothing is privileged, their cameras are even 
capable of capturing handwritten notes made when reading a book. Furthermore the seeming contradiction 
between the Data Protection Act and Section 8 of the Human Rights Act in relation to privacy in public 
places is of some concern. The library essentially makes patrons give up their right to privacy in order to 
access a public institution, which by definition the public should have a right to freely use. It is in fact very 
troubling that UK law enforcement agencies have such ready access to patron information, not limited to 
CCTV footage but also to borrowing records, without there being any requirement for probable cause or 


judicial review. 
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6 Limitations and Direction for Future Research 


The primary limitation of this study has been its scope, while we have a number of institutions within the 
U.S., we have only one within the U.K., and the U.K. library, as a special collections library, is not directly 
comparable with the more general purpose mission of the others. With the established differences in 
legislation and practices between the two countries, a more expansive study is required. This is also true of 
the number of libraries studied in the U.S., and in future research we hope to research libraries from more 
regions of the country. Additionally, our research is limited to information provided by library 
administrators and publicly accessible documents, and does not represent the views of library patrons, or 
even library staff members more generally 

In the future, we intend to pursue additional research into the effects of video surveillance on library 
access for poor and underserved populations that may be particularly impacted by library surveillance, and 
to conduct research with additional libraries that have implemented video surveillance systems. Expanding 
our current findings in these ways will enable us to make claims that are generalizable beyond the scope of 
this current study, gain a broader and more comprehensive understanding of the issues involved when 
libraries implement video surveillance, and further triangulate our data collection and analysis efforts. 
Because of differing theoretical definitions and practical approaches to the concept of privacy across national 
boundaries (Newell, 2011), we are also planning to extend our study to look at other countries in Europe 
and North America. 


7 ~ Conclusion 


Library surveillance may take many forms, including traditional reading and borrowing histories, RFID 
tracking, e-book borrowing choices visible to outside vendors like Amazon and Barnes and Noble, electronic 
and web-based communication and interaction between patrons and library staff, Internet browsing 
histories, and video surveillance. The accumulation and aggregation of these forms of surveillance data can 
potentially pose a threat to the privacy of library patrons and staff in conflict with library commitments to 
privacy and intellectual freedom, especially if libraries do not establish policies to ensure prompt deletion 
or when local or national laws may not adequately protect all these forms of library records. The idea that 
“cameras will solve all your problems,” without more, is disingenuous, and without adequate protections 
for personal information privacy in public spaces (including privacy in information accessible through 
aggregation of data captured by a variety of surveillance mechanisms), library commitments to intellectual 
freedom and patron privacy are tested and stretched to their limits. 

Surprisingly, libraries inconsistently adopted written or explicit verbal policies outlining the 
installation and use of video surveillance systems. The libraries in this study also differed in their conclusions 
about whether surveillance footage should be considered part of a patron record and protected from 
disclosure absent court order or some other judicially sanctioned process. The libraries also approached their 
working relationships with local law enforcement quite differently and, due perhaps partly to more 
restrictive policies for releasing footage to police, one library ultimately decided it was in its best interests 
to remove its entire video security system from 10 branches. 

The adoption of surveillance technologies and the culture within the United States and the United 
Kingdom exhibit some similarities (e.g. little privacy protections for individuals in public spaces) as well as 
some historical and legal differences. In the United Kingdom, legislation requires very strict policies and 
protections for data on individuals, while at the same time allowing law enforcement to act to retrieve such 
data without probable cause. It’s perhaps most revealing that the comprehensive law providing protection 
of personal data within the United Kingdom does not once use the word ‘privacy’ within its statutes. The 
continued and unclear application of Section 8 of the Human Rights Act (1998) in relation to personal data 
is also a contentious issue, especially in regards to its use in conjunction with the Data Protection Act. 
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It is clear that surveillance in public libraries is an expansive topic, of which this study only begins 
to scratch the surface. It is hoped that continued study and a greater expansion of this research will help 
to reveal the extent to which the privacy of library patrons in Europe and the United States is protected, 
and perhaps reveal potential holes that legal and policy reforms ought to address. In particular, lawmakers 
need to craft and adopt clear and precise legal provisions outlining the extent to which video surveillance 
footage is part of the patron record and ought to be protected from disclosure (to law enforcement or the 
general public). Additionally, libraries should be transparent about their surveillance activities by posting 
signs and alerting patrons to the presence of cameras, and should maintain written policies outlining the 
extent of video surveillance, policies related to the retention and destruction of recorded footage, the 
potential uses of video footage, and the processes and procedures required prior to disclosure to third parties, 
including law enforcement. Ideally, these policies would generally require judicial authorization (e.g. a 
warrant or court order) prior to allowing police access to footage that could potentially reveal a great deal 
about patron browsing and borrowing activities. 
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Abstract 

iSchools have been steadily advancing data curation education and practice in response to workforce 
demands. This paper reports on a formative evaluation of the Specialization in Data Curation at the 
University of Illinois, aimed at understanding job preparedness and work experiences of graduates and 
areas for improvement in data curation education. Survey results are complemented by additional 
graduate placement analysis. Employment and career satisfaction were high. Internships, practicum, and 
assistantships were considered key employability factors. Duties emphasize liaison and consulting, user 
instruction, data management, metadata, and policy development. About half of all placements were in 
academic libraries, with the second largest group in the corporate sector. This study, focused on the 
earliest formal LIS program in the U.S. dedicated to curating research data, provides important evidence 
of data curation responsibilities in the workforce and perceived educational gaps that can guide planning, 
design, and improvement of data curation programs. 
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1 Introduction 


Education for information professionals has been evolving for many years to meet the challenges of digital 
content and infrastructure growth and complexity. Programs have emphasized different aspects of the 
profession, including digital librarianship, digital preservation, data stewardship, and digital curation. As 
documented in Gold (2010), “data curation” education and practice has been steadily advancing in the field 
of Library and Information Science (LIS) since 2006. However, the emergence of the field was recognized at 
least a decade earlier by both government agencies and the museum community (Palmer, et al., 2013). It 
gained momentum as scientists acknowledged the need for curation to sustain contemporary research (Gray, 
et al., 2002) and organizations emerged to promote best practices (Lord, et al., 2004). 

Responding to the expected demand for expertise in the curation of research data, the Graduate 
School of Library and Information Science (GSLIS) at the University of Illinois began a Specialization in 
Data Curation in its MSLIS program in 2007. The specialization was created through a 2006 grant from 
the Institute of Museum and Library Services (IMLS) to develop educational capacity in the field of data 
curation, with an initial focus on the sciences. It was extended to include the humanities with a second 
award in 2008. To date, 63 graduates have completed the specialization. 

This paper reports on a formative evaluation of the program, primarily a survey of graduates with 
the specialization, aimed at understanding work experiences, job preparedness, and areas for improvement 
in the program. Survey results are complemented by analysis of placement patterns of the graduates and 
the emergence of new kinds of positions for information professionals with responsibility for digital content. 
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Together, the different evaluation components provide a baseline for longer term tracking of the market 
and assessment of the specialization. More importantly, the results offer important benchmarks for other 
iSchools as they plan, develop, or advance educational programs in data curation. As an evaluation of the 
earliest formal LIS program dedicated to the curation of research data, it provides important evidence of 
actual data curation responsibilities in the workforce and perceived educational gaps, to enable better 


recruitment and design of programs to meet new data demands in the information professions. 


2 Data Workforce Needs 


A well-prepared and trained workforce is the key to managing and preserving data to advance science and 
scholarship, as asserted in numerous reports issued by federal agencies, including the ACLS (2006) and the 
NSF Blue-Ribbon Advisory Panel on Cyberinfrastructure (Atkins, 2003). Data curation requires a workforce 
with specific knowledge and skills to manage and preserve data to be scientifically useful to others 
(Rusbridge, 2007). The next generation of science needs professionals with expert capabilities to select and 
store data; support the discovery, access, and use of data; and ensure data integrity over time (Lord & 
Macdonald, 2003). As essential intermediaries between domain scientists and computer scientists in the 
system of cyberinfrastructure (Bowker & Star, 2009), data curators will be the experts that ensure that 
data are available for public access and fit for reuse. 

With advances in technology and the changing conduct of science, new professional roles have 
emerged as expected (National Science Board, 2005; Hey, Tansley, & Tolle, 2009). Positions such as data 
curator, data archivist, data scientist, and data journalist now exist (Lyon, 2013), and recent growth of 
data curation positions has been documented (Maatta, 2012; Sierra, 2012). New positions in the area of 
data science, where discoveries are dependent on curated data (Stanton, et al., 2012), have attracted 
national attention by being named “the sexiest job” in 2012 (Davenport & Patil, 2012). However, despite 
calls for a more precise analysis of the data workforce needs and responsibilities (Varvel et al. 2010), little 
is known about how data curation roles are currently emerging in the workforce. 

A number of important workforce studies have been conducted concurrent with data workforce 
changes, but unfortunately they have not been designed to identify trends specific to data roles for LIS 
professionals (Marshall et al. 2010; Sivak & De Long, 2009; Griffiths, 2009; Steffen, Lance, Russell & Lietzau, 
2004; Walch, 2006). Moreover, they tend to not represent information professionals working outside of the 
traditional LIS settings. 


3 LIS Data Workforce and Education Trends 


Positions in data curation have proliferated while education capacity has developed more slowly, despite 


clear predictions on demand for data curation expertise in LIS: 


"Library educators have an important role to play in planning for and delivering appropriately 
skilled people to meet the latent demand for data librarians to manage the libraries’ potential data 
curation role. Yet very few library and information science schools currently teach the skills that 
future data librarians will need." (Swan & Brown, 2008, p. 25) 


Influenced in part by recent funding agency requirements for data management planning (Reznik-Zellen et 
al., 2012; Lyon et al., 2013), new responsibilities in research libraries have resulted in a range of new job 
titles with increasingly diverse data responsibilities (Bracke, 2011; Xia and Wang, 2013). The expansion in 
expected expertise for information professionals adds to a continued struggle for LIS identity and recognition 
(Fisher and Julien, 2009; Gray, 2013; Higgins, 2011), complicated by the fact that data responsibilities are 
closely intertwined with other towering professional roles in the digital realm, including information 
gatekeeping (Cox, 2013), and building information and knowledge infrastructure (Edwards et al., 2007; 
Monteiro, 2012; Soenher, Steeves & Ward, 2010; Edwards et al., 2013). 
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Library roles in data curation are necessarily evolving alongside those of digital curation (Gold, 
2007; Gold, 2010), as documented in recent studies of job advertisements of library positions (Park and Lu, 
2009, Kim, et al., 2012) and investigations of how library professionals feel about the new roles and library 
preparedness for providing data services (Tenopir et al., 2013). Case studies and the professional discourse 
are beginning to make explicit the institutional and university level requirements and experiences in building 
the digital data enterprise necessary for research and digital initiatives (e.g., Walters, 2009; Lage et al., 
2011; Prom, 2011; Hswe et al., 2012; Newton et al., 2012; Jahnke, Asher, & Keralis, 2012; Illinois Research 
Data Initiative, 2013; Reznik-Zellen, Ademick, & McGinty, 2012; Tenopir et al., 2013). 

Nonetheless, LIS education capacity is still uneven across schools, although the central role of LIS 
in digital information management and data curation has been acknowledged as a future thrust in the field 
(Heidorn, 2011). According to Harris-Pierce and Liu (2012), only a third of the LIS programs offer a course 
in data curation at the graduate level, with content addressing information resources, information 
organization, metadata, and technical knowledge and skills. An analysis of national data curation curriculum 
in LIS schools identified a total of 203 programs at 63 universities offering courses relevant to data curation, 
but most appeared to be part of digital library curriculum that covers digital content in a more generic 
way, with only a few schools offering programs concentrating specifically on contemporary demands of the 
data workforce (Varvel, Bammerlin & Palmer, 2012). While not yet empirically documented, many LIS and 
iSchools have since made significant progress on new programs. A couple of examples are the Data Curation 
emphasis within the Post-Masters Certificate at the University of North Carolina and the specialization in 
Curation and Management of Digital Assets at the University of Maryland. 

Activity in continuing education for working professionals has progressed in parallel, offered by a 
variety of institutions and in a variety of formats. Since 2006, a sustained series of institutes has been 
offered in data curation at the University of Illinois (Renear et al., 2012), and in digital curation at the 
University of North Carolina (Hank, Tibbo & Lee, 2010). At the same time, non-LIS schools are quickly 
building capacity in data science education, including online offerings to accommodate working professionals 
(see, for example, Howe, 2012). Professional organizations and premier data centers are also providing 
outreach in best practices and tools for data curation, such as the institutes in data management sponsored 
by the Inter-university Consortium for Political and Social Research (ICPSR, 2012), the e-Science Institutes 
offered by ARL and CLIR/DLF, and the extensive and diverse set of activities and resources sponsored by 
the UK Digital Curation Centre. 

Data curation education in LIS has focused on preparing new students as well as extending the skill 
set of current professionals in the workforce. A diversity of programs exists in terms of length, delivery 
modes, and level of certification or specialization. The program at Illinois reported here was incubated as 
the Data Curation Education Program (DCEP), and supported through IMLS grant funds, as were many 
other educational efforts referenced above. In addition to developing the Specialization in Data Curation in 
the masters program, DCEP produced research on education and workforce needs, and supported the 
Summer Institute in Data Curation for working professionals with events focused on the sciences and 
humanities data. With years of development and delivery of data curation education now completed, we 
can begin to assess outcomes within the context of the broader trends in LIS and the workforce at large. 
This report is one piece of the field’s coming efforts to determine our goals and document our achievements 
in data curation as we move into the next generation of information professions, where data expertise will 
undoubtedly be a major part of what we contribute to our research institutions and society. 


4 Methods 


The formative evaluation design and survey development was guided by the following research questions: 


1. What are current data curation workforce needs and future trends from the perspective of 
graduates? 
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2. What are the work experiences of the graduates of the Specialization in Data Curation program? 
3. How well does the program prepare graduates for their jobs? 


In developing the survey, the team reviewed questions from national and large-scale workforce surveys such 
as WILIS (Marshall et al. 2010), A*Census (Walch, 2006) and 8 R’s (Sivak & De Long, 2009). A web-based 
survey was developed employing both closed and open-ended questions covering program assessment, 
employment status, job characteristics, career intentions, continuing education needs, and future trends in 
data curation. It also collected information on the respondents’ current employer and on the data resources 
they are responsible for in their positions. 

The survey was distributed in April 2013 to alumni graduating from December 2008 through 2012, 
applying a census sampling strategy; i.e., all 63 graduates received an email invitation to the survey. After 
two weeks, a reminder was emailed to non-respondents. The response rate was 37% (N=23). Despite the 
lower response rates for web surveys (Hayslett & Wildemuth, 2005), web surveys produce higher quality 
responses than offline methods (Gunter et al. 2002). The survey data provided highly informative and 
valuable indicators for considering next steps for the program and for iSchools interested in beginning 
programs. 

The survey was also conducted in conjunction with ongoing placement analysis of graduates from 
the data curation program. Placement information, including current job title and employer, has been 
recorded for 84% of the 63 students graduating with the Specialization in Data Curation. 

Quantitative data were loaded into R 3.01 software for analysis. Textual responses were analyzed 
using ATLASti 7. Analytical codes were developed using both an inductive and deductive approach. Open- 
ended responses were coded initially to identify emerging themes. Next, the authors reviewed the research 
questions and literature to generate additional codes. A codebook was created with the final set of codes for 
analysis. Two team members coded the data in a process for achieving inter-coder reliability. The survey 
instrument will be archived in the IDEALS repository (Thompson et al., 2013). This paper presents results 
from the quantitative and qualitative analyses arranged by topic areas. 


5 Results 


Results are reported for the following areas: respondent demographics, current employment, careers, 
program assessment, continuing education needs, and future trends. 


5.1 Respondent Demographics 


Survey respondents graduated between 2008 and 2012 with the majority graduating after 2010 (57%). 
Graduates were primarily female (63%) with a median age of 34 years (mean 35; std. dev. 9). By comparison, 
the median age category of recent LIS graduates from the Marshall et al. (2010) study was 31 to 35 years 
(Marshall et al., 2010), and, as of the year 2010, the median age of the US labor force was 41 years (Toossi, 
2012). Seventeen percent of our respondents were non-Caucasian, a somewhat higher percentage of 
minorities than the 10%-11% from earlier surveys (Marshall et al., 2010), an outcome of the DCEP 
program’s efforts to recruit underrepresented students. All respondents were located in the United States, 
currently living in 14 different states. Over a third of respondents were working in the Midwest primarily 
in Illinois and Iowa. California was another prominent location. 


5.2 Current Employment 

The survey asked graduates if they were currently working for pay. Despite the recent economic recession, 
91% of respondents were employed at the time of the survey, with 2 unemployed and seeking work. The 
survey included questions for employed graduates about their position — whether it was considered full-time 
and considered a data curation position. Of those employed (n=21), all held full-time positions. Forty-eight 
percent considered their position to be in the field of data curation. Graduates indicated whether they had 
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the opportunity to apply the skills that they learned from the specialization. Majority (95%) of those 
employed agreed that they apply skills learned from the Specialization in Data Curation in their positions. 
Of those not working specifically in data curation (n=11), 90% had opportunities to apply their data skills. 

The survey asked graduates to select the employer type from response options that best described 
their current employer. Graduates were working in a variety of settings. Among the most frequently reported 
were positions in an academic setting (55%). A few graduates were working in corporate and non-profit 
institutions, and one graduate worked in each setting - research center, data center, and government. These 
findings are fairly consistent with our program’s placement information where we found graduates working 
in academic (49%), corporate (17%), and non-profit (15%) settings. Five graduates are working in 
government, three students are now employed in research centers, and two are employed in data centers. 
Of those working in a data curation position (n=10), 6 graduates were working in academic settings, and 1 
each in government, research center and data center. 

The survey asked for graduates’ position titles and whether there was a more appropriate title for 
their current job. Coders analyzed free text responses for traditional LIS and non-traditional LIS positions, 
and found that traditional and non-traditional LIS roles were blended in the positions of many graduates. 
There was a diverse range of traditional library positions with respondents mentioning each of the following 
at least once: systems department head, reference librarian, subject librarian, index specialist, archivist, 
internal records manager, preservationist, cataloger, procedural documenter, and project manager. One 
graduate summarized their job as more ‘liaison librarian’ to disciplines, illustrating the articulation work 
involved: “I am currently arranging for data management support in collaboration with other offices.” 

There were twice as many positions coded as ‘non-traditional’ LIS roles as compared to the 
traditional LIS roles. For this survey analysis, non-traditional LIS roles were defined as those that emerged 
in the last decade such as the management, preservation, or curation of data and digital objects; work with 
digital repositories; and engagement with metrics and communities through social media. Titles associated 
with non-traditional roles included Data Manager, Data Management Consultant, Digital Project Analyst, 
Web Metrics, Social Media Specialist, Systems Architect, and Application Analyst. 

In the survey, respondents selected their current annual salary from a list of salary ranges. Forty 
percent of respondents reported annual salaries between $50,000 and $59,999. A few graduates selected 
salary ranges of $60,000 - $69,999 and $70,000 - $79,999. All respondents indicated salaries of $30,000 or 
greater. 

Job functions that you perform in your position. In the survey, graduates selected from a 
list of 20 duties. Overall, the respondents had positions comprised of several duties. Of the response options, 
most frequently reported duties were liaison and consulting (67%), user instruction (67%), data management 
(62%), metadata (62%), and policy development (62%). Thirty percent had supervisory responsibilities. 
With regard to data, 58% of all employed respondents had shared decision-making authority. Approximately 
37% had some input to decision-making. Graduates working in data curation (hereafter referred to as data 
curators) all had duties in data management (100%), with high levels of responsibility in preservation 
planning (70%), data quality (70%), and compliance (70%). The duties topping the list for graduates not 
working in data curation (hereafter referred as non-data curators) were training (64%) and consulting and 
liaison (45%). 

The survey gathered descriptions of current job duties in two questions — ‘describe the work you 
do in your current position’ and ‘elaborate or specify other duties.’ From analysis of open-ended responses, 
a set of categories emerged to describe the work of data curation professionals: technical, service and 
managerial duties. Technical duties were defined as those associated with development and support of 
digital technologies including software and hardware components. These duties ranged widely from 
managing existing data systems to designing new infrastructure. See Table 1 for examples of job duties by 
type. Service-oriented duties were primarily in community outreach and training. The service audience 
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included the public, scientists, data users, colleges, and universities. Administration and managerial duties 


were defined as work overseeing daily operations, policies, planning, and resources. 


Technical duties 


Managing existing data systems 

Designing new infrastructure 

Generating data 

Preserving data (systems) 

Designing interface 

Developing of digital project workflows 
Consulting on data management plans (infrastructure-oriented) 
Analyzing data 

Ensuring quality control 

Developing applications (e.g., mapping data) 
Using tools for asset management 

Modifying existing software tools 
Documenting procedures 

Ensuring security 

Implementing access level 

Complying with policies 


Service duties 


Assisting with data management plans at both planning 
implementation stages 

Training scientists and data users on best practices 

Providing support in data access 

Engaging with data management, informatics and design issues 
Explicating data management problems 


Admin/managerial duties 


and 


Managing personnel 
Coordinating projects 
Managing databases 
Overseeing collections 
Managing systems 
Allocating resources 
Developing policies 
Overseeing operations 
Writing reports 


Table 1: Examples of job duties by type 


Describe the data you work with in your position. The survey asked graduates to select the domain 


areas and formats of data that they encounter in their current job. The most frequently reported domain 


areas were Life Science (43%) and Physical Science (43%). A few graduates selected Social Science, Business, 


Government, Technology, and Health. Graduates were responsible predominantly for digital data, such as 
text (57%), images (52%), presentations (48%), videos (48%), spreadsheets (48%), and databases (48%). 
Data curators identified responsibility for spatial data (70%), computation models (50%), computational 


code (60%), and spreadsheets (80%). Interestingly, 27% of those in non-data curator positions had 


responsibility for spreadsheets. 
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All but one employed respondent is responsible for digital objects in their current position. As noted 
in the open-ended responses, most work with both digital and physical objects. Four positions focused 
primarily on research data, with only one noting responsibility for ‘Big Data.’ Almost half worked with 
research data and with non-data objects, such as an internal digital library and audio-visual files. For 
graduates working with physical objects, formats included books, paper reports, reel-to-reel tape, and 
vertical files. 

Institutional Context. In response to several questions, we found references to institutional 
context. The importance of making a business case for data curation was a recurrent theme. Six graduates 
described how they had to get ‘buy in’ from scientists and administrators within their organization. One 
noted: “the real challenge is convincing researchers, research administration and even funders that data 
curation, not just data sharing, is a good return on investment.” Resource allocation challenges included 
“convincing data creators to consider the long-term curation of data in the face of time and budget pressure.” 
One graduate indicated a need for more preparation in “rigorous change management,” exclaiming: “It is 
really, really hard to be the bearer of change!” Communication across domains is vital to making progress. 
As one graduate stated: “I’m speaking to computer scientists, engineers and PhDs in math/physics. The 
link between LIS and data curation is not apparent to most and it took me a while to understand where 
languages of expertise met and where they diverged.” The lack of appropriate infrastructure was noted as 
a barrier: “I’m managing data as part of an assessment project... There is interest in promoting these services 
but we have to get a few things in place for infrastructure/resources assigned.” 


5.3 Careers 


The survey asked graduates about the number of positions held since graduation and length in their current 
position. A majority of respondents (96%) have held only one position, holding their current position for a 
mean of 1.6 years (std. dev. 1.2) with a range from 1 week to 5 years. Respondents were asked to indicate 
their level of satisfaction with data curation as a career. Most graduates were satisfied with data curation 
as a career (90%). The survey also gathered graduates’ level of agreement with statements about career 
opportunities. The majority agreed that they have opportunities to develop leadership skills (91%) and to 
advance their career (100%). 

In planning for workforce demands, retention of current employees is an important consideration. 
Data curators were asked whether they plan to still be working in data curation in 5 years. Majority (90%) 
planned to still work in data curation. The survey asked non-data curators if they plan to pursue 
employment in the data curation field in the future. More than half (62%) report an intention to pursue 
employment in the data curation field. 

Each career survey question was followed by an open-text box where many graduates elaborated 
on their answers. Overall, graduates described data curation careers with positive terms. For instance, a 
graduate described data curation as “what I love to do.” One graduate described data curation as “a super 
interesting field,” while another graduate noted “I like providing access to cool stuff.” One graduate enjoyed 
the diversity and opportunities associated with the work: “I do so many things every day, I am working 
with lots of totally different people, so much opportunity, I can’t imagine ever getting bored or stuck.” 

Respondents were asked to describe any previous education or experiences that helped them get 
their current job, responding with comments on formal and informal education and prior positions. A few 
graduates reported that previous degrees in domain sciences (e.g., geology, biology, agronomy) helped them 
get their current jobs, of which two had both undergraduate and graduate degrees. A digital librarian 
mentioned a certificate in digital libraries helped them be competitive for their current job. 

Previous work experience, both paid and unpaid, was of interest to employers. Five graduates 
described internships in data curation, domain sciences, or relevant LIS (e.g., preservation, copyright). Eight 
respondents reported on the importance of their work experience prior to the LIS program, including 
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experience in research settings and LIS settings. One credited their previous career in the non-profit sector: 
“T had a career in nonprofits for about 10 years, 4 of which were in training. My presentation and facilitation 
skills were a huge part of my getting this job, as well as my general tech-savvy-ness.” Five mentioned the 
value of their graduate student assistantships. For example: “My experience as a research assistant in 
UIUC's NSF-Funded [project title redacted] helped me learn about creative research methods and 
collaborative work in the social sciences...” 

In open-ended responses, seven graduates mentioned their fieldwork experiences helped them get 
their current job. A variety of fieldwork sites were reported such as universities, museums, and data centers. 
A respondent suggested: “More hands-on work would be useful for some of the classes. I know of at least 
two other people who went through the program and are confident of our grasp on theory, but not so much 
on our ability to apply that knowledge.” They also noted the value to employers of new graduates that 
have a combination of education and work experience. 


5.4 Program Assessment 


The survey gathered information on how effective the program was in preparing them to meet their 
professional obligations. From the response options, most graduates (74%) rated the program as very 
effective or effective in preparing them. Twenty-six percent reported the program was somewhat effective 
in preparing them. The survey also asked respondents to select which topics the program prepared them 
for in their work. Three-fourths of graduates reported that the program prepared them for metadata and 
documentation. Almost half of respondents felt prepared for preservation planning (61%), modeling and 
ontologies (57%), data management (52%), and programming (48%). See Figure 1. Overall, the qualitative 
responses were positive about the program. Graduates were appreciative of the opportunities to pursue data 
curation in their graduate program. For instance, one graduate responded, “I’m grateful for it. Those courses 
were what I loved about grad school.” 
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Compliance | 39 
Discovery EIT Sait 35 
Access & fecLS€ A i: 30 
Selection & appraisal =A 26 
Policy dev. MEE 26 
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Data processing mmm 13 
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Figure 1: Percent who felt prepared for data curation topics in their professional work 


In a series of open-ended questions, the survey asked graduates about the most useful topic in their career, 
topics missing from curriculum, most valuable aspects of the program and recommendations for program 
improvements. Graduates’ responses broke down into four general categories: useful topics, useful courses, 
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specific skills, and program organization. The three topics most cited as useful were: current trends and the 
data curation landscape; metadata and documentation; and computer programming. The data curation 
courses that graduates frequently mentioned as having been particularly useful in their careers were: Digital 
Preservation, Metadata in Theory and Practice, Information Modeling, and Systems Analysis and 
Management. Among specific skills mentioned by respondent were Python, XSLT, XML, Cocoon, RDF, 
and SQL. One graduate wrote, “I can't think of a single course/area of study that I haven't drawn upon in 
my work." In addition to coursework, 30% of respondents cited the opportunity to build a strong network 
of instructors and colleagues as one of the most successful aspects of the program. Other identified strengths 
included hands-on work, breadth of knowledge and skills, and the rigor of the coursework. 

Graduates were also asked to make recommendations on program improvements. The most frequent 
recommendation, mentioned by four respondents, was providing a greater emphasis on computer 
programming. Other recommendations offered by two or more respondents included more emphasis on data- 
specific domains and change management. Respondents reflected on the importance of experiential learning 
by recommending more hands-on work in the classroom (22%), with a few suggesting that fieldwork should 
be a program requirement. In open-ended responses, a few graduates requested more engagement with 
domain communities and data producers either in the classroom or through fieldwork. One insightful 
comment identified how education will need to change as the field evolves: 


“grow the connection with practitioners, prep data curation students for program development roles 
in short term, then expand to include both management and detail-oriented worker paths (we will 
eventually need both, but the short term need is much more for visionary leaders, [in my opinion]).” 


The survey asked respondents about whether they completed and found useful practicum and internship 
experiences during their data curation program. Fifty-two percent of graduates completed a practicum or 
internship while studying. More than half the respondents (52%) recommended that students complete a 
practicum or internship. Of those that did not complete a practicum (n=10), 7 graduates wished they had 
completed a practicum. Interestingly, graduates that completed a practicum (75%) felt more prepared for 
their duties of computer programming than those that did not complete a practicum (18%). 


5.5 Continuing Education Needs 


The survey asked respondents whether they were interested in pursuing continuing education opportunities 
and whether they had pursued any additional education opportunities. A majority of respondents (87%) 
indicated that they were interested in continuing education, and 61% had already pursued additional 
education or professional development since graduation. Respondents not interested in continuing education 
were asked to select the reason. From the response options, the three respondents not interested in 
continuing education specified a lack of time and already having the skills that they need. 

Those interested in continuing education (n=20) were asked to rank the top three topics that they 
were most interested in pursuing. The most frequently ranked topics were metadata, modeling, data 
interpretation, and infrastructure (see Table 2). For the first ranked topics, metadata was the most frequent 
first choice. Surprisingly, programming was the only topic not ranked by any of the graduates. Additional 
continuing education topics suggested by respondents included research administration and proposal 


writing. 
Topics aap Ogi) Ranked #1 Ranked #2 Ranked #3 
position (1-3) 
Metadata 30 20 10 0 
Interpretation & analysis 30 10 15 5 
Modeling 30 5 5 20 
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Tech. infrastructure 25 10 0 15 
Administration 20 10 10 0 
Preservation systems 20 10 10 0 
Selection & appraisal 20 0 5 15 
Access & reuse 15 5 0 10 
Training & instruction 15 5 5 5 
Data management 15 0 15 0 
Data quality 15 0 10 5 
Discovery 15 0 5 10 
Data processing 10 5 0 5 
Policy development 10 5 0 5 
Preservation planning 10 5 5 0 
Compliance 10 0 5 5 
Liaison & consulting 5 5 0 0 
Data collection 5 0 0 5 
Programming 0 0 0 0 


Table 2: Percent Ranking Top Continuing Education Topics (n=20) 


The most preferred delivery method for continuing education was a one-time event of 1-2 working days 
(30%), followed by a one-time event of 3-5 working days (15%), and a course of 1-5 contact hours per week 
for one semester (15%). Graduates who have already pursued continuing education opportunities (n=14) 
reported participating in a range of delivery modes, with a concentration in webinars (36%) and conferences 
(29%). Additional delivery modes were certification programs, semester-long courses, summer courses, 
workshops, seminars, and discussion groups. Code Academy and MOOCs were also specified for online 
options. 

Topics of completed continuing education included program management, scientific data processing, 
big data, computer science, web development, business analysis, semantic technologies, digital humanities, 
digital archives and records management, Resource Description and Access, and higher education 
administration. Respondents also mentioned learning specific tools and software, such as GIT, R, SQL, and 
Drupal, as well as the Python programming language. 


5.6 Future Trends 


Related to recent discussions of data curation and data stewardship issues and agendas (e.g. Jahnke et al. 
2012; RDSA, 2013), the survey asked graduates open-ended questions examining perceptions of emerging 
issues for data curation professionals. Responses included lively usage of verbs—such as, managing, defining, 
bridging, educating, drumming up, convincing, and selling—to describe their future work in data curation. 
As one noted: “to some extent, I don’t see that one can escape managing digital files/records of some kind.” 
Many responses echoed the findings of the 2010 Research Data Workforce Summit on the need for 
engagement with current practice in data centers and the importance of communicating and bridging across 
domains (Varvel et al. 2010). 

Graduates foresee increasing levels of management of complex datasets and anticipate issues with 
data formats and sources, expressing concerns with video, media production, linked data, and streamed 
sensory data in “rapidly changing information environments.” Continuing education was reported as highly 
important for practicing data curation professionals. Many aspects in the data curation field are rapidly 
changing including data formats, standards, and best practices, and there is an urgent need to stay informed 
and keep their knowledge up-to-date. Two respondents specifically cited the critical need for continuing 
professional development, not only in data curation but also in the domains where data are generated. The 
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ability to collaborate and communicate was seen as vital for coordination of activities across communities 
and institutions. Six respondents referred to the need to keep scientists and administrators informed and to 
foster “buy-in” to new practices and services. Specific comments noted the importance of “creating 
awareness of data as something that is useful to be shared” and promoting awareness of data curation more 
generally. Respondents described having to make the business cases for data curation and clarify how data 
curation differs from standard operating procedures. 

Finally, graduates expressed the need to clarify the role of LIS in relation to data curation. They 
highlighted the continuing need to address job titles, the meaning of a data curation degree, and the work 
of LIS because “the link between ‘LIS’ and data curation was not apparent to most...” Graduates described 
having to explain to scientists, employers and other research staff what data curation is and how it fits with 
other data work. 


6 Discussion 


Similar to respondents in Marshall et al.’s (2010) general study of recent LIS graduates, the Specialization 
in Data Curation graduates were employed with high levels of career satisfaction. Academic institutions 
were the top employer for data curation graduates, as seen with LIS graduates (Marshall et al. 2010) and 
archivists (Walch et al. 2006). While slightly more than half of graduates were not in data curation positions, 
per se or exclusively, data skills and knowledge were applicable to a wide range of institutional settings and 
positions as shown by 90% of graduates applying data skills to their current job. 

Also as seen in Marshall et al. (2010), most data curation graduates (74%) rated the program as 
very effective or effective in preparing them to meet professional obligations. In terms of job preparedness, 
respondents reported that the program prepared them highly in the areas of metadata, preservation 
planning, and modeling. This contrasted with Marshall et al. (2010) where general LIS graduates reported 
gaining basic knowledge of the field, information seeking, and ethics from their LIS education. The 
differences would be expected in the two studies of different aspects of the field, but it also suggests that a 
general LIS education would be far from adequate for current data curation positions. 

Respondents cited internships, practicum, and assistantships as key factors in their employability. 
As seen in Marshall et al. (2010), data curation graduates also reported that practicums or other hands-on 
experience were beneficial, suggesting practical experience as an area for program improvement. Experiential 
learning in the classroom and through external fieldwork is clearly advantageous, with three respondents 
suggesting that some form of fieldwork be a required for completion of the specialization. As one respondent 
remarked, “just like any field there is a vast difference between theory and practice.” 

More than half of all respondents are actively engaged in liaison and consulting, user instruction 
and training, data management, metadata and documentation, and policy development. These are all areas 
where best practices are actively being developed. As would therefore be expected, respondents strongly 
recommended that data curation programs sustain a network of students, instructors, and alumni for longer 
term engagement with other professionals in similar roles. More curricular emphasis on data-driven domains 
and active domain engagement was also recommended. This is a clear need since data professionals will 
increasingly provide services directly to researchers who produce data and will work in partnership with 
them on data management planning, implementation, and development of tools and value-added services. 

Technical expertise was viewed as highly important, and it is particularly interesting that only 43% 
of respondents listed it as one of their duties. Graduates seem to perceive a need for such technical skills 
even if they are not currently utilizing them. Other common duties, such as data management and policy 
planning, are likely to remain a prominent feature of data curation job descriptions, though it is possible 
that focus on these will decrease as functional data infrastructures are established in conjunction with data 


management practices. 
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Collaboration and communication across domains will be important for the emergent field of data 
curation. Graduates emphasized the importance of communicating the role of data curation in the larger 
research arena. Respondents cited having to define data curation and explain how a curation approach 
differs from their standard operating procedures. This suggests there is an increasing need to define the 
jurisdiction of data curators (Abbott, 1988) and disambiguate the various roles within the research process 
that can best be handled by data professionals (Varvel et al. 2010). 

As the field continues to evolve, iSchools will want to consider how to offer continuing education 
and professional development opportunities to their alumni. At present, graduates are mostly taking 
advantage of webinars even though nearly half of the respondents stated a preference for options like 
workshops lasting either 1-2 days or 3-5 days. The ubiquity and affordability of the webinar format will no 
doubt remain attractive to practicing professionals, but there is also a market for workshops and institutes 
that can fill the need for continuing development in emerging best practices. 


7 ~ Conclusion 


The survey and placement analysis shows that the Specialization in Data Curation is meeting workforce 
needs, as evidenced by level of employment and diversity of job types. According to placement analysis, 
about half of all positions were outside of academic libraries, with the second largest group in the corporate 
sector. While satisfaction with the program and with job placements was high, attention is needed in the 
areas suggested for improvement especially in providing more experiential learning and more applied data 
curation opportunities. The advice from one respondent “to keep innovating” is important to all iSchools. 

The National Data Stewardship Alliance (2014) finds that “studies must be broadened and repeated 
over time to establish a robust evidence base from which generalizable guidance can be drawn” (p. 23). In 
support of this agenda, there are several expected next steps for further formative evaluation, including 
interviews with selected participants and continued tracking for longitudinal analysis of graduates early in 
their positions and as they have longer tenure in the field. We are also interested in expanding the study 
across more data curation programs to make progress on larger trends. As more data curation graduates 
enter the workforce, iSchools will have expanded opportunities to assess gaps in job preparedness, shape 
curricula based on reported job duties, and build programs that foster long-term professional development. 

Evaluation of progress is essential for the field to accurately scope its professional responsibilities 
in data curation and respond with quality education programs. With appropriate workforce preparation, 
data curation can move us beyond the perception of data as ‘a problem’ (Jahnke, 2012) to the opportunities 
and vision of a culture of knowledge built upon a 21st century foundation of data. 
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Abstract 

Information literacy, defined as the skills and stages of successful information problem-solving, is often 
cited as a goal of education efforts at every level, pre-kindergarten through higher education. For these 
efforts to be effective, they must be guided by empirical research on information literacy. This study 
sought to determine the extent to which evidence of how students develop information literacy skills 
gleaned from empirical research is explicitly represented in a high-profile education policy initiative, the 
Common Core State Standards. Results reveal that not all stages of the information problem-solving 
process are represented in these standards, and that the crucial stage of Task Definition is not explicitly 
represented at all. Implications and directions for future research are presented. 
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1 Introduction 


Successfully using information to accomplish tasks or solve problems, that is, information literacy, is crucial 
in an information society (American Library Association, 1989). In the digital age, the challenge shifts from 
seeking and finding information to the use of information to efficiently and effectively solve information 
problems. Since the concept was first identified by Paul Zurkowski in his address to the Information Industry 
Association in 1974 (Zurkowski, 1974), the value of information literacy has been established in scholarly 
literatures (Bruce, 1997; Chevillotte, 2010), professional practice (Association of College & Research 
Libraries, 2008), and pedagogy (Grassian & Kaplowitz, 2009; Julien, 2005). Moreover, in October 2009, the 
Presidential Proclamation of National Information Literacy Awareness Month in the United States 
established information literacy as a national priority (Obama, 2009). Successful solutions to problems 
personal, societal, and global will depend exclusively upon each individual’s degree of information literacy. 
In an overview of information literacy instruction, Grassian & Kaplowitz (2009) state: “In an era when new 
technologies and sources of information proliferate at breakneck speed, being information literate is not a 
luxury or a casual pastime. It is an essential survival skill for a changing world” (p. 2429). 

In order for students to be prepared for college and career readiness in the digital age, they must 
be information literate; thus, the goal of information literacy and related skills must be explicit in education 
policy documents. For example, information skills are key components of the Framework for 21st Century 
Learning, developed by the Partnership for 21st Century Skills, “a broad coalition made up of education 
nonprofits, foundations, and businesses working together to make 21st century education a reality for all 
students” (Partnership for 21* Century Skills, 2013). 

The Common Core State Standards (CCSS) Initiative in the United States is a high-profile 
education policy initiative seeking to establish a single set of clear educational standards for kindergarten 
through 12th grade in English language arts and mathematics. The standards were developed through the 
National Governors Association and the Council of Chief State School Officers, and have been voluntarily 
adopted by forty-five states, the District of Columbia, four territories, and the Department of Defense 
Education Activity to date (National Governors Association & Council of Chief State School Officers, 2013). 
The standards are comprehensive, setting expectations for reading, writing, speaking and listening, language 
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and mathematics for grades K-12. From the CCSS Initiative mission statement: “The standards are designed 
to be robust and relevant to the real world, reflecting the knowledge and skills that our young people need 
for success in college and careers. With American students fully prepared for the future, our communities 
will be best positioned to compete successfully in the global economy.” The CCSS Initiative also claims that 
the criteria by which the standards were developed include evidence and scholarly research (National 
Governors Association & Council of Chief State School Officers, 2013, FAQ). This study seeks to investigate 
this claim. 

Given the value placed on information literacy in the U.S. (Obama, 2009), and the impact of what 
has become a national education policy initiative, this study sought to determine the extent to which 
evidence of how students develop information literacy skills gleaned from empirical research is explicitly 
represented in the CCSS Initiative. The first goal was to detect evidence of any of the skills or stages of the 
information problem-solving process, as described by the Big6 information literacy process model, in the 
CCSS Initiative; the second goal was to detect evidence of the skills related to the particular stage of Task 
Definition. 


1.1 Describing Information Literacy Behaviors 


Decades of research on information literacy have contributed to an understanding of how people successfully 
access, evaluate, use, and share information to answer questions, complete tasks, and solve problems. 
Information literacy may be described as the skills and stages of the information problem-solving process; 
that is, those who are successful at each stage of the information problem-solving process are information 
literate. Models of information problem-solving behavior include Kuhlthau’s Information Search Process 
(2004) and Eisenberg & Berkowitz’ Big6 information literacy process (1990). Several studies have adopted 
an information problem-solving model as an operational definition of information literacy. In reviewing the 
skills necessary for life in an information society, Brand-Gruwel (2005) concludes: “All the skills, knowledge 
and attitudes, which are needed...can be defined as information literacy...or as information problem-solving” 
(p. 488). These models provide a framework for organizing the many aspects of information behavior as 
described by Wilson (1999) and focus for an investigation of information literacy. 

The Big6 information literacy process (Figure 1) by Eisenberg & Berkowitz (1990), describes the 
process of successfully solving an information problem. This model describes the first stage as Task 
Definition, in which the problem-solver defines the task or problem to be solved, and then identifies the 
information needed to solve the problem. From there, the problem solver engages in information seeking 
strategies, location and access of information, use of information, information synthesis, and evaluation of 
the process and product. The process is often not linear, and stages may be repeated throughout the process. 
The development of the Big6 was informed by practice, but it has been employed as a conceptual framework 
in several studies of information problem-solving (Brand-Gruwel et al., 2009; Gerjets & Hellenthal-Schorr, 
2008). Brand-Gruwel et al. (2005) studied expert and novice higher education students in an effort to 
decompose the Big6 information literacy approach into cognitive components, and to determine the key 
components in the information problem-solving process. They conclude that the Big6 information literacy 
approach was an accurate description of stages in information problem-solving, and useful in the 
decomposition of cognitive components into related categories. Murray (2008a, 2008b, 2010, 2011) has done 
extensive work aligning various standards to the Big6 model. Neither Kuhlthau’s ISP (2004) nor other 
models have been aligned to the standards. 


Stage Sub-stages Actions 


tes 1.1 Define the information problem | What is my current task? 
1. Task Definition ae ; À 
1.2 Identify information needed (to | What are some topics or 


solve the information problem) questions I need to answer? 


539 


iConference 2014 David Willer et al. 


What information will I need? 
What are all the possible 
sources to check? 


2. Information Seeking | 2.1 Determine all possible sources 
Strategies (brainstorm) 


What are the best sources of 
2.2 Select the best sources 


information for this task? 


3.1 Locate sources (intellectually | Where can I find these sources? 
and physically) Where can I find the 


3.2 Find information within sources | information in the source? 


3. Location and Access 


What inf tion do I ect 
4.1 Engage (e.g., read, hear, view, Se sees is ay 


touch) 
4.2 Extract relevant information 


4. Use of Information to find in this source? 
What information from the 


source is useful? 


How will I organize my 
; 5.1 Organize from multiple sources | information? 

5. Synthesis . : 
5.2 Present the information How should I present my 


information? 


Did I do what was required? 
Did I complete each of the Big6 
Stages efficiently? 


6. Evaluation 6.1 Judge the product (effectiveness) 
6.2 Judge the process (efficiency) 


Table 1: Big6 Information Literacy Process (Eisenberg, 2007) 


1.2 Information Literacy Research 


Research on information literacy, from library and information science and cognate fields, contributes to an 
understanding of information behaviors that lead to learning and successful problem-solving, and informs 
pedagogical practice and education policy. Rather than focus on one area of human information behavior, 
the study of information literacy takes a comprehensive approach to the examination of information 
behavior that results in successfully completing a task, answering a question, or solving a problem. Such a 
process approach enables the identification of particular practices that lead to, or thwart, successful 
outcomes. 

In a series of studies, Head & Eisenberg (2012; 2009a, 2009b, 2010a, 2010b) provide the most current 
insights into the information literacy practices of young adults in higher education. Among the findings 
from these studies is that students seek context for their course-related and everyday life-related research. 
Students need context in order to formulate the problem situation and generate a plan for resolution; the 
information problem is situated within a context that is crucial to recognize. A related finding is that college 
students report difficulty at the initial stage of the research process. These studies also find that students 
are rather savvy with the information systems and services available to them, yet they rely on a small and 
familiar set of resources, a rote method for conducting research activities, and experience difficulty getting 
started. Gross (2005, 2007) found that students tend to overestimate their information literacy skills. Many 
students likely equate information literacy with just one stage of the entire process, the location and access 
of online information, without regard for the other crucial stages of the information problem-solving process. 


1.3 Task Definition: A Crucial Stage in Successful Information Problem-Solving 


While research on information literacy investigates information behaviors across the entire process of 
information problem-solving, a pattern emerges—a pattern indicating that the initial stage of Task 
Definition is worthy of further investigation. According to Eisenberg & Berkowitz (1990): “Most people 
spend too little time on task definition. The tendency is to push ahead even though they have only a general 
or vague understanding of what it is they are seeking to accomplish. By spending time considering the 
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information problem and then articulating a clear understanding of (a) the information problem and (b) 
specific information needs related to that problem, people can move much more efficiently toward solutions” 
( p.6). Skill at the initial stage of the information problem-solving process is crucial to success, and that 
difficultly at this stage typically leads to inefficient and ineffective information behaviors in later stages of 
seeking, search, and use. 

This pattern is corroborated by research in cognate fields. Literature from the learning and cognitive 
sciences reveals that domain experts are more successful at solving problems within their domain because 
they are more effective at the initial phase of the problem-solving process, termed problem representation 
(Blessing & Ross, 1996; Chi, 2006; Hardiman et al., 1989). The knowledge of domain experts is organized 
differently than that of novices, and is brought to bear more effectively on problem-solving situations than 
that of domain novices (Bransford et al., 2000). Domain experts categorize problems according to type 
through analogical reasoning, and these types are described by the concept of schema (Novick & Bassok, 
2005). Within problem schema is embedded a solution strategy for successful solution to the problem 
(McNamara, 1994; Novick, 1988). Bransford et al. (2000) recognize the importance of the initial stage of 
problem representation: “An important aspect of learning is to become fluent at recognizing problem types 
in particular domains..so that appropriate solutions can be easily retrieved from memory” (p. 44). Thus, 
research from multiple fields indicates that skills associated with the initial stage of Task Definition are 
crucial to eventual success in information problem-solving. 

Again, empirical research on learning and problem-solving serves to inform instructional practice 
and education policy. These findings are directly relevant to the way in which students in pre-Kindergarten 
through higher education programs are taught to access, evaluate, use and share information for the purpose 
of answering questions, completing tasks or assignments, and solving problems. 


1.4 Common Core State Standards 


Applied to education policy, the term standards implies both a model of achievement and the gauge by 
which the achievement is measured. Typically, educational standards represent the opinions of experts on 
what students are capable of and should be doing at a particular grade level (Lee, 2002; Porter, Polikoff, & 
Smithson, 2009). Standards establish what is to be learned, the defined goals and objectives for instruction 
and learning; there are often established for each subject and grade level by state agencies and subject area 
associations. The national standards movement in the U.S. emerged out of the educational reform movement 
of the 1980s and 1990s that was sparked by the report, A Nation at Risk (National Commission on 
Excellence in Education, 1983). The Elementary and Secondary Education Act of 2001, commonly referred 
to as the No Child Left Behind Act, required testing and the development of state standards. As a result, 
all fifty states set their own educational standards. These standards varied in quality (Carmichael, Martino, 
Porter-Magee, & Wilson, 2010). According to (Goertz, 2010): “Policy makers must reach consensus on the 
type, content, and specificity of the standards; determine who will develop the standards; and facilitate the 
implementation of the standards” (p. 52). 

Standards have frequently been created by a group of designated experts (Ballard, 2009; Barnett, 
2008; Lee, 2002). Six U.S. national professional organizations, the American Association of School Librarians 
(AASL), the International Society for Technology in Education (ISTE), the National Center for History in 
the Schools (NCHS), the National Council for the Social Studies (NCSS), the National Council of Teachers 
of Mathematics (NCTM), and the National Research Council (NRC), each designated a task force of experts 
to develop a set of standards within their domain. According to Goertz (2010), the NCTM has used 
consensus methods to develop standards. The United States History Standards were developed through a 
“broad-based national consensus building process” (National Center for History in the Schools, Crabtree, & 
Nash, 1994, p. iii). Consensus-building appears to be the method used by the other professional organizations 
as well for developing standards sets. This consensus-building process has led to the criticism of standards 
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as being bloated, since compromise has equated to addition without subtraction (Lee, 2002; Marzano & 
Kendall, 1996, 1998; Phillips, 2009; Schmoker & Marzano, 1999). Phillips (2009) has characterized this 
criticism of standards creation calling the process a “political sausage factory, in which the most important 
goal is to respect each individual committee member’s personal, cherished opinions” (p. 28). 

There is little research that supports placement of particular content at a particular grade level 
(Kendall, 2001). According to Zenger & Zenger (2002): “No solid basis exists in the research literature for 
the ways we currently develop, place and align educational standards in school curricula. If this sounds 
shocking, it should not. The same holds true for placing subject-matter content at specific grade levels 
(scope and sequence)” (p. 212). Content has been placed in scope and sequence documents based on 
tradition, individual teachers’ expertise, because it is the textbook, professional judgment, or current 
practice and standards documents appear to be no different (Zenger & Zenger, 2002, 2003). 

The current state of the standards movement in the U.S. is the establishment of the Common Core 
State Standards (CCSS) Initiative. The CCSS were written by convening work groups comprised of experts 
in content areas, teaching, researchers, and other interested stakeholders. These work groups drew on 
existing state standards and their own experiences to draw up the CCSS. Criteria used to develop the CCSS 
required that they be rigorous, clear and specific, teachable and learnable, measureable, coherent, and 
internationally benchmarked (National Governors Association & Council of Chief State School Officers, 
2013). 

The CCSS are replacing the former standards sets in forty-five states. It is important to find out if 
the shortcomings of previous sets of standards have been addressed. According to the Standards Setting 
Considerations, “21st century skills” have been implemented where possible (National Governors 
Association & Council of Chief State School Officers, 2013). These skills are not a separate set of standards 
but incorporated into the various disciplines within the standards. Two of the 21st century skillsets are 
information literacy and problem-solving (Partnership for 21st Century Skills, 2009). This paper examines 
the CCSS for the inclusion, or exclusion, of explicit references to information problem-solving skills. 


1.5 Research Questions 


As an education policy initiative dictating the learning expectations for students in grades K-12 and adopted 
by forty-five states, the District of Columbia, four territories, and the Department of Defense Education 
Activity, the CCSS Initiative represents a major education policy statement. The overarching goal of the 
research described in this paper is to cite evidence that research on information literacy is indeed reflected 
in this policy. This study first seeks to identify explicit references to the skills or stages of the information 
problem-solving process, as described by the Big6 information literacy process, in the CCSS Initiative. It 
then seeks to identify explicit references to the skills related to the particular stage of Task Definition. 
The following research questions emerge from this problem space: 


1) Which stages of the information problem-solving process, as described by the Big6 Skills, are 
reflected in the Common Core State Standards Initiative? 

2) How are skills specific to the initial stage of Task Definition in the information problem-solving 
process reflected in the Common Core State Standards Initiative? 


2 Method 


2.1 Research Design 

The CCSS standards were analyzed as part of an exploratory content analysis. This was one phase of a 
larger research process categorizing standards from both the CCSS and the American Association of School 
Librarians (AASL) into the corresponding stage of the Big6 information literacy process. This sorting 
method was chosen in order to have multiple individuals with expertise in information literacy reviewing 
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the standards statements and assigning the standards statements to Big6 stages. It combined both a 
straightforward system with increased assurance that the standards statements assigned to a particular 
Big6 stage belonged to that stage rather than if one or two individuals had assigned the standards 
statements to Big6 stages. 

The researcher identified and recruited five content analysts with expertise in information literacy 
to serve as coders (see Figure 2). Initial contact and all communications were by email. 


Team Member | Expertise 


1 National Board Certified Social Studies teacher; doctoral candidate in Information 


Science 

9 School librarian, Information School lecturer; doctoral candidate in Information 
Science 

3 Research Assistant, National Center on Quality Teaching and Learning; School 
librarian and a classroom teacher; doctoral student in Information Science 

4 School Librarian, Information School lecturer; doctoral student School of Education 

5 Director of Library and Media for state level office of Superintendent of Public 


Instruction. 


Table 2: Content Analysts Qualifications 


This study uses the term “standards statement” to describe a discrete statement of what a student should 
be able to do or know. This is equivalent to the AASL term “benchmark” and the CCSS use of the phrase 
“erade-specific standard.” The CCSS standards statements were chosen from Grades 2, 5, and 8 for grade- 
level correspondence with AASL standards statements in order to make an equivalent comparison at a later 
date. 

The CCSS are made up of strands, anchor standards, and grade-specific standards; as noted, this 
research focuses on the grade-specific standards and refers to these as standards statements. Content 
analysts were asked to review all the CCSS standards statements (377) in English/Language Arts and 
Mathematics for Grades 2, 5, and 8, and categorize them according to stages of the Big6 information literacy 
process. 

Survey instruments were created using IT Connect Catalyst tools from the University of 
Washington for each content sub-area of the CCSS. Examples of these areas from English Language Arts 
& Literacy in History/Social Studies, Science, and Technical Subjects include: Reading Standards for 
Literature K-5, Reading Standards for Informational Text K-5, Writing Standards, Speaking and Listening 
Standards K-5, and Language Standards 6-12. Examples from Mathematics include: Operations and 
Algebraic Thinking, Numbers and Operations in Base 10, and Geometry. A total of 15 separate surveys 


were created. 


2.2 Data Collection 


Content analysts independently reviewed 377 CCSS standards statements grouped into 15 surveys. Analysts 
sorted each of the standards statements into a stage or sub-stage of the Big6 information literacy process 
(see Figure 3 for example); analysts were given the choice to identify the standards statement as “not 
related” to the Big6. Moreover, analysts were able to evaluate standards statements as “Unable to tell” for 


statements that were ambiguous or poorly-worded and not clearly aligned with any stage or sub-stage. 
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NOt 
Unable Big6/Little 
1 1.11.2 2 2.12.2 3 3.13.2 4 4.14.2 5 5.15.2 6 6.1 6.2 to tell 12 


1. Determine the 
meaning of words 
and phrases ina 
text relevant to a 
grade 2 topic or 
subject area. 


2. Know and use 
various text 
features (e.g., 
captions, bold 
print, 
subheadings, 
glossaries, 
indexes, 
electronic menus, 
icons) to locate 
key facts or 
information in a 
text efficiently. 


Figure 1: English/Language Arts Informational Text to Big6 Stage Rating Chart. 


After all 15 surveys from each of the five content analysts were submitted to the researcher, a 60% level of 
consensus was used to determine where each item fit into the Big6 stages. This meant that three out of the 
five team members agreed that the individual standards standard fit into a particular Big6 stage. In some 
instances team members choose different sub stages of the Little 12 within the same Big6 stage but since 
the end result sought was categorization at the Big6 level these variations within categories are not discussed 
here. The researcher then tabulated the results (see Figure 4). In Row 1, the standards statement was 
categorized as Big6 Stage 4, Use of Information, while in Row 2, the standards statement was categorized 
as Big6 Stage 3, Location and Access. 

Using a 60% level of consensus, 81% of the 377 CCSS standards statements were able to be 
categorized into either a Big6 stage or into the Not Able to Tell/Not Big6 category. This compared to only 
41% being so categorized when using a consensus level of 80%. The 60% level of consensus was chosen in 
order to maximize the number of standards statements included in the research project. The researcher 
made a decision to err on the side of including more standards statements rather than to omit standards 
statements. 


Row 1: 
1. Determine the meaning of words and phrases in a text relevant to a grade 2 topic or subject area. 
Possible points 0 


Participant Date Response 


Date Response 
4/07/2013 1:31 PM Not Big6/Little 12 
4/12/2013 9:18 AM 4.1 

3/24/2013 10:20AM 4.1 

4/06/2013 8:38 PM 4.1 

3/20/2013 9:36 AM Not Big6/Little 12 


Participant information (email address) has been omitted. 
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Date Response 
4/07/2013 1:31 PM 4.1 
4/12/2013 9:18 AM 3.2 
3/24/2013 10:20 AM 3.2 
4/06/2013 8:38 PM 3.2 


3/20/2013 9:36 AM 3.2 


Figure 2: Sample Results 


2.3 Results 
The final tally of CCSS standards statements categorized into Big6 stages is shown in Figure 5. At this 
stage, we are reporting on the number of standards in each Big6 category, using a simple tally system with 
the goal of detecting patterns, not statistical significance. As noted, the intent here is to determine what is 
present in the CCSS in terms of the information literacy skill or stage as represented by the Big6 information 
literacy process model. 

Figure 5 shows that, according to this team of content analysts, the CCSS has an emphasis on Big6 
Stage 5 (Synthesis), and to a lesser degree Stage 4 (Use of Information). However, it is also apparent that 
no CCSS are related to Big6 Stages 1 (Task Definition), 2 (Information Seeking Strategies), or 6 
(Evaluation). Of the 122 standards statements that we placed into a Big6 stage out of 377, 86 were at Big6 
Stage 5 (Synthesis), another 33 were at Stage 4 (Use of Information), and 4 were at Stage 3 (Location and 
Access). There were no CCSS standards statements that were categorized into the remaining Big6 stages. 


NB = Not Big6é; NC = No 


Big6 Stage Consensus 
# 
Title Stds 1 2 3 4 5 6 NB NC 

ELA History/Soc Sci/Tech Writing 20 0 0 0 1 12 0 6 1 
ELA History/Soc. St Reading 

Literature 10 0 00 1 38 0 

ELA Information Text 30 0 0 1 7 10 0 

ELA Language 70 0 0 3 5 0 50 10 
ELA Literature 27 0 0 0 5 0 12 7 
ELA Reading Standards 

Foundational Skills 17 0 00 4 00 13 0 
ELA Science & Technical Reading 

Literacy 10 0 00 5 2 0 2 1 
ELA Speaking & Listening 29 0 0 0 3 13 0 7 6 
ELA Writing 61 0 0 0 2 30 0 23 6 
Math Geometry 19 0 0 0 0 2 0 11 6 
Math Measurement & Data 20 0 0 0 0 4 0 9 7 
Math Number System 8th Grade 24 0 0 0 0 50 16 3 
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Math Numbers & Operations 


Fractions 133 0 0 0 0 0 #0 7 6 
Math Numbers & Operations in Base 
10 20 0 00 0 0 0 16 4 
Math Operations & Algebraic 
Thinking 700 0 0 0 0 5 2 
Totals 377 0 0 4 33 86 0 183 71 


Table 3: Common Core State Standards by Big6 Category 


3. Discussion 


This study first sought to determine if the skills or stages of the information problem-solving process, as 
described by the Big6 information literacy process, were evident in the CCSS statements. It then sought to 
determine if the crucial skills related to the initial stage of Task Definition were evident in these standards 
statements. At a broader level, this study sought to find evidence that current research on information 
literacy is reflected the CCSS Initiative. The CCSS Initiative is an important education policy initiative in 
the United States as 45 states have adopted these educational standards and will be modifying curricula 
and assessments to align with them. A goal of the CCSS Initiative is to prepare students for college and 
career readiness in the information age. Many authors have argued that information literacy skills are an 
important skillset for students in the 21st century. 

The method used was an exploratory content analysis to examine how standards statements from 
the CCSS could be categorized into the Big6 information literacy process model by a team of content 
analysts with expertise in the conceptual and pedagogical aspects of information literacy. The results were 
analyzed to identify areas of consensus; 123 standards statements from the CCSS corresponded to stages of 
the Big6 model, according to the content analysts. 

The standards statements of the CCSS provide clear evidence of incorporating Big6 Stage 5 
(Synthesis)—the most frequently-identified Big6 stage, showing up 86 times. The stage of Synthesis is where 
students organize and present the information they have found, in the process creating their own new and 
unique answer to the information problem. Examples of Synthesis in the CCSS are included in Figure 6 
below. Additionally, both Big6 Stages 3 and 4 were also present in the CCSS, though to a lesser degree. 
These represent the location and access of information, and engaging with and extracting relevant 
information. These skills go beyond rote memorization—they require students to apply their knowledge and 
create new products. 

Approximately half of the CCSS standards statements did not fit into the Big6 information literacy 
process model. It is clear from Figure 5 that one reason for this is that the CCSS Math standards statements, 
representing about one quarter of the total, contained only 11 standards statements that were related to 
information problem-solving. This in itself is surprising, as it would seem that Task Definition would be an 
important part of problem-solving in Math. One possible explanation for this is that the team was unable 
to reach consensus on 28 of the Math standards statements; while indeed applicable, the Big6 is not typically 
applied to mathematical problems. 

The crucial stage of Task Definition, according to studies in various fields, is undetectable by 
content analysts in this study. Problems assigned in schools tend to be well-structured problems. Well- 
structured problems have clear parameters and often have known, or correct, answers. In this environment, 
the problem is frequently seen as being assigned by the teacher and the student’s responsibility is to find 
the answer. This reliance on well-structured problems is a possible explanation for the lack of the Task 
Definition stage in the CCSS. School is a formal learning environment, and as such the problems presented 
to students tend to be well-structured problems with a known solution that students are supposed to reach. 
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Informal learning environments outside of school have ill-structured problems that often have no single 
answer or competing answers of relatively equal value. To prepare students for their future, we want them 
to be able to solve these ill-structured problems that are typical everyday-life problems, not just the well- 
structured problems that students encounter in formal learning environments. One of the goals of education 
is the transfer of learning to new situations. Being able to analyze a problem and define what information 
is needed and what needs to be done to solve the problem is the most important part of information 
problem-solving, yet explicit standards for doing this are not apparent in the CCSS. 

Explicit evidence of two other stages is also missing. Big6 Stage 2 (Location and Access), is an 
important skill for students to possess. Students must be able to identify potential sources of information 
for a particular problem and know how to obtain access if they are to deal with an information problem 
successfully. Big6 Stage 6 (Evaluation), enables students to identify both when they have completed a 
project successfully and what areas could be improved in future work. The lack of these skills seems to 
reflect a reliance on a well-structured problem model of education. Students in this model have information 
provided for them, though they are expected to use the information and to synthesize some type of product 
based on it. Additionally, in this model the teacher is seen as the evaluator, thus de-emphasizing student 
self-evaluation. 

The CCSS Initiative does include a set of standards that are related specifically to research skills. 
These are based on the Anchor Standard, “Research to Build and Present Knowledge.” However, these 
research standards make only a weak attempt to go beyond Big6 Stages 4 and 5. There are a total of 15 
standards statements in this category across grades 2, 5, and 8. This anchor statement is included in both 
the strands of Writing and Writing Standards for Literacy in History/Social Studies, Science, and Technical 
Subjects 6-12, resulting in some duplication of standards statements. These 15 statements included three 
standards statements at Big6 Stage 4, two at Stage 5, five that the team rated as “Unable to tell,” and five 
that the team did not reach a consensus. Many of these statements incorporate multiple concepts in one 
statement, thus making it difficult to identify the main thrust of the statement and to identify with a 
particular Big6 stage. This lack of clarity also suggests that practitioners may have difficulty identifying 
concretely what students are to accomplish. 


Example CCSS Statement Big6 Stage 


Know and use various text features (e.g., captions, bold 


print, subheadings, glossaries, indexes, electronic . 
: é Sy Stage 3 Location and Access 
menus, icons) to locate key facts or information in a 


text efficiently. 

Distinguish among facts, reasoned judgment based on 
2 8 lt 8 : i 8 Stage 4 Use of Information 

research findings, and speculation in a text. 

Support claim(s) with logical reasoning and relevant, ; 
, Stage 5 Synthesis 
accurate data and evidence that demonstrate an 
understanding of the topic or text, using credible 
sources. 
Provide a concluding statement or section that follows 


4 Stage 5 Synthesis 
from and supports the argument presented. 8 “ 


Table 4: Comparison of CCSS to Big6 Equivalents 


4 Conclusion 


This conclusion includes a discussion of the limitations of this research, its implications, and areas of future 
research. 
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4.1 Limitations 


This research focused on CCSS in grade levels 2, 5, and 8. It is unknown which Big6 stages are represented 
in CCSS at other grade levels. The research did not attempt to look at CCSS in the high school grades and 
it could be that areas of the Big6 information literacy process that are completely lacking in grades 2, 5, 
and 8, such as Task Definition, are present in standards for grades 9-12. 

A second limitation was the group of experts recruited for content analysis. This group consisted 
of only five members, and though it did include members with expertise at a variety of grade levels and in 
librarianship, their experience was greater in library skills than in classroom teaching. A different group 
made up of those with different types of teaching experience might categorize the CCSS in a different 
manner. Additionally the sorting methodology used was intended to be straightforward and aimed at erring 
on the side of including standards statements into the Big6 stages rather than attempting a definitive 
identification of standards statements into appropriate Big6 stages. 

Also, finding alignment between Big6, which are process stages, and CCSS, which are outcomes, 
may be problematic. Many of the process stages described by the Big6 information literacy process are 
likely implied in the CCSS statements, but not explicit in the standards statements. 


4.2 Implications 


The CCSS Initiative may be insufficient to meet the goal of preparing students for college and career 
readiness. The most important step in information problem-solving is identifying the task and the CCSS 
Initiative simply does not address this stage. 

Additionally, it is possible that school districts may reduce the use of teacher-librarians as 
instructors through interpreting the CCSS’s claim to incorporate 21st century skills as meaning students 
will learn information problem-solving skills through classroom instruction that is informed by the CCSS 
Initiative. This would be a mistake as information problem-solving skills are inadequately represented in 
the CCSS. 

This study indicates that adding specific standards in the CCSS that are related to all six of the 
stages of information problem-solving are necessary to fully prepare students to meet the challenges of the 
21st century. The CCSS Initiative could be improved by clearly developing standards that address each 
stage of the information problem-solving process. 


4.3 Future Research 


The first area for further research will be to examine grade levels other than 2, 5, and 8, especially the high 
school grade levels, since no high school levels were examined. This would clarify whether or not the grade 
levels studied are exceptions or whether they are representative of the CCSS as a whole. 

A second area is asking expert, experienced teachers to rate the standards for developmental 
appropriateness and importance. Expert teachers with years of teaching experience should be reliable judges 
as to what the correct level of any given standard is and whether or not the CCSS have hit this mark. 
Additionally, one of the common criticisms of standards in general is that they are bloated, and that 
consensus is reached by addition. Asking expert, experienced teachers to evaluate the importance of 
individual standards statements from the CCSS will give an indication of whether this criticism is true of 
the CCSS. 

Further research is needed to determine whether implied skills that are related to information 
problem-solving must be taught explicitly, the consequence of not stating these explicitly in standards 
statements, and methods for how these are taught most effectively. A specific question this raises is, to 
what extent do teachers realize and act to instruct skills that are implied but not explicit in standards 
statements? 
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Information literacy is one of the key skills for living and working in the 21st century. The CCSS 
Initiative is an important education policy initiative affecting education in 45 of the 50 states. The CCSS 
Initiative in its current form fails to clearly address several areas of the information problem-solving process, 
and most importantly fails to include Task Definition. To fully address the needs of the nation’s students 
in the 21st century, these faults must be addressed. 
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Abstract 

Recent research suggests that a preference for navigation by folder to re-find files endures notwithstanding 
dramatic improvements in support for search as an alternate method of return. This paper describes a 
study that confirms this finding for files but that observes a distinctly different pattern of preference for 
re-finding email messages. After a delay of 2 to 4 weeks, search was the most common first choice for the 
return to email messages. A third, compound method was predominant for the return to Web information: 
The use of character-by-character “auto-complete” search was frequently followed by a hyperlink 
navigation to reach a targeted web page. Results point to the need for an integrated support of search 
and navigation methods during re-finding attempts. Results also suggest that support for re-finding 
begins with support for the initial “keeping” of information. Finally, results affirm two basic tenets of 
personal information management (PIM): 1. The need to consider multiple forms of information. 2. The 
need to consider a PIM activity such as re-finding within a larger context that includes other activities 
of PIM and considers the life cycle of personal information. 
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1 Introduction 


Re-finding — the return to information previously accessed — is both a common everyday activity and an 
important area of research in the field of Personal Information Management (PIM) (Capra & Pérez- 
Quiñones, 2005; Elsweiler, Baillie, & Ruthven, 2008). 

People may most remember their protracted, time-consuming efforts to re-find or their outright 
failures. Less noteworthy but perhaps in aggregate even more costly are the routine steps taken with each 
incidence of re-finding. Re-find the email message you are meaning to answer. Re-find the several more you 
may need to reference in order to answer accurately. Re-find the Web page with information about an 
upcoming social event. Re-find the document you send people with parking instructions for a visit to your 
office.. Re-find the pictures you took last weekend. We may be reminded of Licklider’s observation in his 
highly influential article, “Man-Computer Symbiosis,” concerning his own work day: 


About 85 per cent of my ‘thinking’ time was spent getting into a position to think, to make a 
decision, to learn something I needed to know . . . my choices of what to attempt and what not to 
attempt were determined to an embarrassingly great extent by considerations of clerical feasibility, 
not intellectual capability (Licklider, 1960, p. 4) 


When we reckon with the many small acts of re-finding in a typical day, a few steps or seconds saved with 
each successful act of re-finding could really add up. It is, therefore, important to understand better the 
methods by which people choose to return to information currently as a step towards understanding how 
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we might improve upon these methods of re-finding. We aim for circumstances of PIM in which the “clerical 
tax” we pay is at a minimum.’ 

Consider the re-finding of files. A time-honored method of returning to documents on the “desktop” 
(i.e., on a person’s computer) is to proceed through a series of steps: navigate through a folder hierarchy to 
the folder where the file is thought to be located and then scan the folder contents to recognize the sought- 
for file. This method will be referred to throughout the current article as folder-based navigation. 
“Navigation”? is to emphasize the central importance of recognition as opposed to recall (Lansdale, 1988). 
“Folder-based” is to distinguish this kind of navigation from several others (navigation through the 
hyperlinks on a web page, for example). Folders and the folder hierarchy play an essential role in this 
particular variation of navigation. 

Navigation, in its many variations, stands in contrast to a more direct method, commonly called 
“search”, wherein people type a few keywords in the hopes of recognizing the sought-for file in a list of 
matches that are returned. A search that bypasses layers of hierarchy would seem to be potentially much 
faster than folder-based navigation and so a preferred choice of re-finding method. To the contrary, Barreau 
and Nardi (1995) found that people overwhelmingly preferred navigation to search as a method of re-finding 
files. 

The years since their studies (done in 1993 and 1994) have seen dramatic improvements in desktop 
search. And yet, a preference for navigation as a re-finding method apparently endures. Boardman & Sasse 
(2004) report that people expressed a preference for navigation to search although the strength and nature 
of this preference varied depending upon the form of information involved. Folder-based navigation was 
frequently mentioned as a preferred method of return to files — supplemented, on occasion, by a sort and 
scan of files within a targeted folder. Folder-based navigation was less commonly used for email and search 
was mentioned more frequently though still as a last resort after other methods of return — scanning and 
sorting of the inbox — had been tried first. Preferences for the return to Web pages were less clear with 
participants reporting the use of both bookmarks and direct search through a search service. 

More recently, Bergman, Beyth-Marom, Nachmias, Gradovitch & Whittaker (2008) report that 
people still prefer folder-based navigation as a means of return to files. Search was a last-resort after other 
methods of return had failed. A strong preference for folder-based navigation re-asserted itself even for 
study participants who were introduced to better search support (Google Desktop) during the course of the 
study. After reporting an initial spike in the use of search, participants reported that their preferences 
settled back on folder-based navigation as the primary method of return. Barreau (2008) also reports a 
persistent preference for navigation (or browsing) as a means of return. In a related finding, Bergman, 
Gradovitch, Bar-Ilan & Beyth-Marom (2013) report a strong preference for the use of folders over tags as 
the primary means of organizing files and emails. 

On the other hand, with now standard availability of fast, index-supported search for email and 
increasing allowances for email storage, there appears to be some tendency away from the use of folders for 
organizing email and a tendency, instead, to leave emails, read and un-read, in the inbox (Whittaker, 
Matthews, Cerruti, Badenes, & Tang, 2011). 

The study described in this article complements previous research in two respects: 


1 See Jones (2013, Chapter 9) for a comparison of clerical tax rates as these have changed over time and are likely to change in the 
future. For samples of “The Future of Personal Information Management”, Parts 1 and 2 (and soon Part 3) visit 
http://keepingfoundthingsfound.com. 

? The term “browsing” is also commonly used in the literature. The term “navigation” is used in this article to emphasize the purposeful 


nature of the re-finding activity. The person is looking for a specific item of information. 
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1. Participants were observed for their actual re-finding behavior as they completed a delayed cued 
recall task. By contrast, data from the Barreau and Nardi, Boardman and Sasse, and Bergman et 
al. studies come from self-reports3. 

2. In the spirit of cross-form research established by Boardman & Sasse, participants were observed 
as they returned not only to files but also to email messages and to web pages. How does the 
observed method of re-finding vary depending upon the kind of information to be re-found? 


2 Method 


Note: The re-finding study described here was part of a much larger longitudinal investigation, conducted 
in 2009, into the ways that people manage information over time. Qualitative results relating to this larger 
investigation have been described elsewhere (Bruce, Wenning, Jones, E., Vinson, & Jones, W. 2010). 
Results presented here are from a more focused, quantitative analysis of this data. 

Seventeen participants (nine male, eight female, ranging in age from 18 to 49) were recruited 
through a Craigslist announcement and through flyers placed around the University of Washington’s main 
campus. Five participants were students at the university (two graduate students); also participating were 
three teachers (high school), two system administrators, three software developers, a journalist, a network 
technician, a systems analyst and a librarian. 

The procedure followed is similar to that used previously to study the re-finding of Web pages 
(Bruce, Jones, & Dumais, 2004). Participants completed a set-up session followed by a test session, two to 
four weeks later. Each session lasted about an hour. The procedure involved personal information — files, 
emails (received) and web pages actually viewed by the participant. 

Given the potentially sensitive nature of this information, special steps were taken to insure that 
participants had ultimate control over the materials used in the sessions. Participants, under observer 
direction, generated three different lists — one for files, one for email and one for web pages — in an ordering 
that was counter-balanced across participants’. Each list was sorted by date (i.e., received date or last 
accessed date) so that more recent items were topmost. 

The attempt was also made, for each list, to create a sampling of test items that was distributed, 
roughly, over the previous 7-day period. This meant sampling from a list beginning with items viewed 
“Today” until two acceptable test items were selected. Items were then sampled from “Yesterday” until 
two test items were selected and so on until 14 items test items had been selected for each form of 
information. (If the next day for sampling had already been reached because of “skips”, as described below, 
then participants continued from their current position in the list.) 

To select items in a list within a given time period, participants worked through items one by one. 
The observer did not see the items during this stage of the set-up session. Participants were instructed to 
bypass items that, for whatever reason, they preferred not to include in the study. For items not skipped, 
participants rated the likelihood that they would want to re-find this item again over the next twelve 
months. For items where the likelihood was rated as 75% or higher, participants were then asked to briefly 
describe a reason for re-finding the item. Participants were encouraged to be as specific as possible but 
without referencing the item by “name” (e.g., file name, domain name, sender name or subject 
tagline).°Participants were not told that they would later be tested on these items. 


3 Though Bergman et al.(2008) also provide corroborating evidence from logged data for Linux users. 
1 For email messages, participants were instructed to simply work through the inbox of their primary email account. For web pages, 
participants viewed a listing generated by the history facility of their primary browser. For files, participants were instructed in the 
steps to generate a search of files viewed in the past seven days, ordered from most to least recently viewed. 

» 


5 Examples of reasons included “to use as reference for what wife wants for birthday next year”, “want to change prescription to mail 
order; need to use form attached to email”, “use receipt when filing income taxes next year”. 
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Finally, with the participant’s permission, the item was opened in its own window and a screenshot 
was taken of the item. 

The test session involved selected testing of five items randomly chosen from each list (file, email, 
web page). Items from a given list were tested in a block and the ordering of blocks (for file, email, web 
page) was counter-balanced across participants. 

For a given item, participants were given the “reason for return” which they had provided during 
the set-up session. With reference to a reason, participants were instructed to imagine a situation “now” 
where the item was needed’. Participants were then asked to re-find this item as quickly as possible by 
whatever method or combination of methods they chose. Participants were instructed to think-aloud as 
they did so.” 

Participants were timed and their method(s) of re-finding were noted. Trials exceeding five minutes 
were stopped and marked as a failure to re-find. All methods used on a trial were recorded and independently 
cross-checked later with reference to an audio recording of the test session. 

Participants worked from their own computer in each session. Operating systems represented on 
participant computers included Windows XP, Windows Vista, and the Macintosh OS X (seven, five and 
five participants, respectively). For participants running under XP, five used MS Outlook as their primary 
interface to email and two used Thunderbird. For participants running under Vista, two used Outlook as 
their primary interface to email; two used Gmail and one used the AOL web client. For participants running 
under the Mac OS, two used Apple Mail as their primary interface to email, two used Gmail and one 
participant used Yahoo Mail. 

Four XP participants used the Microsoft Internet Explorer as their primary web browser; the 
remaining three XP participants used Mozilla FireFox. For Vista users, two used IE as their primary web 
browser, one used FireFox and two used both IE and FireFox For Mac OS X users, four used FireFox and 
one used the Apple Safari browser. 


3 Results 


As shown in Table 1, the rate of successful return was high across conditions (93%, 88%, 84%, respectively, 
for files, email and web pages). An item was commonly found in three minutes or less. Moreover, success 
was likely to come with the first attempt. Results are consistent with high rates of successful re-finding 
reported by Boardman and Sasse. 


Files Email Web Pages 
Eventual success 93% 88% 84% 
Re-found in 3 minutes or less 90% 83% 81% 
Success with first method of re-finding 78% 57% 83% 


Table 1: Success rates and speed of return were high across information forms. 


Across trials and forms of information, participants made an average of 1.48 attempts before successfully 
re-finding a targeted item. 


ê On those occasions where a participant had no recollection of an item given the reason for return, they were instructed to “give it a 
shot” i.e. by imagining some item they had previously encountered that would fit the “reason for return” and then to return to this 
item. 

T Near the outset of the session, participants were given practice in thinking aloud by doing so as they created a file (e.g., MS Word 
document), typed “Hello world!” and saved the file in a folder of their choice. Participants were told only to describe only the “what” 
of their current thoughts and actions, not the “why”. The observer provided gentle reminders if the participant deviated from the 


instructions or stopped talking for more than 15 to 20 seconds. 


555 


iConference 2014 William Jones et al. 


In some cases, however, participants made several attempts and tried several different methods 


before either locating the targeted item or “timing out”. A look at some of these instances of multiple 


attempts is instructive: 


Participant SX-215, looking for an email, tries search, then search again, then navigation, then 
sorting (success): 


“I'm in Outlook, I'm gonna go into "Look for" and I'm gonna put in, uhh, "currency" and click 
"Find now." I: So the "Look for" is a search bar? R: Yeah. And that didn't find it, so I click "Clear" 
and then I'm gonna put in "Money" and see if that finds it. And that didn't find it. So then I'm 
gonnaing (?) to wonder if I put it in my Money folder. I didn't think I had, so I'm gonna look there. 
I: So you didn't remember that you had the Money folder? R: No, I have it. I just didn't think that 
I had, uhh, filed it away yet. And I definitely haven't filed it away yet, so I'm going back to the 
inbox, and then what the heck was that guy's name? I think this is an email that I sent to myself, 
so I'm gonna sort it by sender and scroll down to my name, where there's probably twenty five 
emails. And there it is. Foreign Exchange.” 


Participant JS-197, looking for a file, tries navigation four separate times (success). 

Participant AP-123, looking for an email, tries sort (by “From” field), then navigation through 
email folders, then search (on person’s name), then search again on the word “schedule” 
(unsuccessful). 

Participant BK-129, looking for a web page, looks first in a “Favorites” drop-down list, clicks one 
favorite link for a jump to “YouTube”, searches within this site twice, then searches from Google 
(successful). 

Participant ET-156, looking for an email, searches the Inbox for “house sitting”, then searches for 
a person’s name followed by a sort of results by “Received” date, followed by another search of the 
Inbox for “dinner” (successful). 


The data summarized in Table 2 show a very large discrepancy between the three conditions with respect 


to the use of search. A planned paired t-test comparison between search as a first method for files vs. email 


was significant (t=4.41, p < 0.001, df=16).* A second planned comparison between the use of search for 


files vs. web pages was not significant (t=0.18). 


Folder navigation dominates as a first method of return to files. If the use of “recent” lists is also 


classed as a kind of navigation then the dominance of navigation increases to nearly 90%. Folder navigation 


as a first method of return to email (i.e., involving the use of email folders) was much less prevalent. 


When navigation of the inbox is also included, a measure of total navigation increases to just over 42%. 


Even so, in a paired t-Test, this measure is significantly lower than total navigation for files (t=3.84, p < 
.002, df=16). 


8 Obviously, given the small sample size (17 participants) the power of these tests is low. Since tests involve pairwise comparisons of 


data points collected for the same participants (i.e., the design is within-subjects), statistical significance is not, however, unheard of 


even in such a small sample size and, some might say, all the more “practically significant” when it is observed. 
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Files Email Web Pages 
Search 6.28% (12.74%) 42.35% (34.39%) 8.00% (10.53%) 
Folder Navigation 83.73% (22.57%) 19.61% (29.65%) 27.22% (26.28%) 
Total Navigation 89.42% 42.16% 43.45% 
Other 4.30% 15.49% 48.55% 


Table 2: Mean percentages (and standard deviations)’ for methods used first to re-find files, emails and web 
pages. 


Included in the “other” category for email is sorting which occurred 10.87% of the attempts for email. 
Sorting was most commonly by sender. Of the 10 instances (7 participants) where sorting was a first method 
of return to email, 9 were sorted by sender (the other was sorted by presence or absence of an attachment). 
In the two instances where sorting was used as a second method of return, both were by sender. Sorting 
might be regarded as a variation on search where users must decide which attribute to focus on and must 
have an approximate idea as to the value of the attribute (e.g., Sender name). If sorting is added, then the 
dominance of search is even higher for email — at 53.53% of the trials. 

For web pages, folder navigation equates to the use of a bookmarking facility (through which people 
can save and organize web references), usually as a feature of the browser. Other recognition-heavy methods 
of return that might reasonably be classed as variations in navigation include hyperlink navigation (9.56%), 
the use of a history facility (5.56%), and the occasional happenstance that the target web page is already 
open in a browser window or tab (1.11%). But a total navigation measure that includes these variations is 
still significantly lower than the total navigation measure for files (t=3.92, p < .01, df=16). At an observed 
8%, search as a first method of re-finding a Web page is in line with a previously observed occurrence of 
13% (Bruce et al., 2004). 

The web pages “other” entry includes one very popular method of return to web pages — auto- 
complete. A person types a few characters in the address well and selects from a list of matching web 
addresses that is generated and revised incrementally as the person types. Auto-complete might be regarded 
as a variation of search — albeit, where the search space is highly constrained to include only web addresses 
for pages that a person has visited. However, in over 50% of its uses in the current study, auto-complete 
was used as part of what might be called a compound method of return: Participants used auto-complete 
to get to the right web site. Participants then used hyperlink navigation to click to the targeted page within 
the site. 

Participant comments attest to the popularity of auto-complete: 


ET-156: Umm I’m going to guess that auto complete is gonna find the site for me. And it did not. 
Umm so I’m gonna continue typing the URL from memory and hit enter. 


EW-191 - Umm, there's only a few sites that I go to regularly, and they, umm, just because they're 
visited often, if I start typing them in, they'll finish for themselves, so. 


What effect did a participant’s operating system and its level of search support have on a participant’s 
overall tendency to use search as a first method of re-finding? A simple non-paired comparison between 
participants with limited search support (Windows XP) and participants with advanced search support 
(Windows Vista and Macintosh OS X) reveals an overall intendancy to use search more when search support 
is more advanced (t=1.83, p < .05, df=15). 


° Note that since measurements themselves are percentages, the means and standard deviations are also expressed as percentages. 
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% Search 1* attempt 24 attempt 3" attempt 4 attempt 
Files 6% 53% 42% 50% 

Email 42% 62% 60% 67% 

Web 8% 33% 25% 100% 


Table 3: The likelihood of search with each attempt. 


Table 3 suggests an increasing tendency to use search on subsequent re-finding attempts. How did a person’s 


tendencies to “file” and organize impact their choice of re-finding method? We address this question by 


following the terminology of Boardman and Sasse but with adjustments according to the information form 


involved: 


Total Filers 


Files/Documents. Organize all or almost all of their files immediately upon creation. Make extensive 
use of folder hierarchy. No uncategorized files anywhere with rare exceptions. 

Emails. All incoming email is organized in some way; inbox contains zero messages at end of most 
days or all messages are tagged in some way, with rare exceptions. 

Web pages. All web references are organized or tagged according to some scheme. Extensive 
hierarchy may be present for large collections. 


Extensive Filers 


Files/Documents. Organize extensively, but leave some items unfiled (allow file to be saved 
according to system default of desktop, My Documents, etc.). Extensive hierarchy of files. (see 
Figure 1). 

Emails. Try to organize many messages on a daily basis. Inbox generally contains to-do items or 
does not extend beyond one page. 

Web pages. Organize many bookmarks as they are created or at the end of a browsing session. Most 
web references are organized in some way. Folders used but may have a quarter to half of bookmarks 
uncategorized. 


T 200501.WebArchivist Software Suite.pps 
Achterman Campus, community activities.doc 
Achtermann POS FEB 08 summary.doc 
Actterman POS Oct 2007 summary.do< 

ied articles 
E notes 

1080407Tech -= Generals List Neff.doc 
1080408History List 

10804 10Content Anal - List (colors).doc 
10804 10Content Anal - List.doc 

10804 10News Soc List.doc 

IMy List for RB.doc 

INews Soc Reading.doc 

ITech reading list 

542 syllabus 07.doc 


Figure 1. A portion of an “Extensive Filer’s” organization of files. 
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Occasional Filers 


Files/Documents. Organize occasionally in spurts or during “spring cleaning”; leave most items 
unorganized and have relatively few folders. Rely heavily on system default location for saving 
electronic files. 

Emails. Organize only a few messages on a daily basis or do occasional “spring cleaning” to organize 
inbox/email messages. Inbox generally contains hundreds of uncategorized messages. 

Web pages. Organize bookmarks sporadically. Some organization of bookmarks exists but most are 
uncategorized. Bookmarks are only occasionally organized or deleted. 


No Filers 


Files/Documents. Do not organize any electronic files; all files are in one location such as the 
desktop or immediately under a default folder (e.g. “My Documents”).. Rely on search or other 
method to find documents. 

Emails. Do not organize or tag any messages. No use of folders or labels. Inbox contains potentially 
hundreds or thousands of messages. Rely entirely on search for finding emails. 

Web pages. Don’t bookmark or otherwise keep references to web pages. Rely instead on return by 
Web search. If bookmarks are created, these are not organized. 


Files Email Web Pages 
Total Filers 0 0 0 
Extensive Filers 6 6 7 
Occasional Filers 9 8 5 
No Filers 0 3 2 
Undetermined 2 0 3 


Table 4: The number of participants classified in each of the "organizing tendency" categories. by 


information form. 


Files Email Web Pages 
Higher tendency to 2.8% Search; 71.7% 13.9% Search; 71.7% 5.2% Search; 10.5% 
organize Navigation Navigation Navigation 
Lower tendency to 8.2% Search; 90% 57.9% Search; 26.1% 4.8% Search; 7.6% 
organize Navigation Navigation Navigation 
Table 5: The use of search and navigation as first methods appears to vary depending upon organizing 


tendency and is especially pronounced for email.” 


Table 4 


provides a breakdown of participants by “organizing tendency” and information form. 
Table 5 provides a breakdown of search vs. navigation across different forms according to whether 


a person has a higher tendency to organize (Total Filers, Extensive Filers) or a lower tendency to organize 


(Occasional Filers, No Filers). The strongest apparent difference between higher and lower organizers is for 


email. Not surprisingly, perhaps, people who are more likely to organize are more likely to navigate. 


10 Note that in none of the cells of the table do percentages sum to 100% -- most notably not in the Web column. This is because so 


many of t 


he methods of return observed resist classification as either “search” or “navigation”. 
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4 Discussion 


Consistent with previous research, there is a strong preference for folder-based navigation as a first method 
to re-find a file. However, the picture changes when the target of a re-finding attempt is an email or a web 
page. 

Better search support does matter. Participants are more likely to choose search under Vista and 
Mac OS X, where search support — notably the creation and seamless maintenance of an index that greatly 
speeds search — is built into the operating system. The greater inclination to search is seen across information 
forms but appears especially strong for email. 

Even so, an overall preference for navigation persists that is especially strong for files. Why? We 
can consider two hypotheses. 

A “location is better” hypothesis follows from a consideration of the affordances for location and 
a sense of place in both our physical and digital worlds. Bergman et al. (2008) note the power of the location 
metaphor. Teevan et al. (2004) point to a general importance for a sense of location. 

Barreau and Nardi note a reminding value for location-based finding. Related to this is a value of 
serendipity: along the path to sought-for information, other information of relevance may be encountered 
that might otherwise be overlooked. 

As Teevan et al. note, an “orienteering” method of information access also enables a more stepwise 
progression towards desired information in which the expression of each step is relatively easy and, if a 
wrong step is taken, corrective backtracking is also easy. 

There are related notions that recognition is generally easier than recall (Lansdale, 1988) and that 
the memory for how to access information is sometimes more “in our movements”, so to speak, as procedural 
knowledge rather than “in our words” as declarative knowledge (Bergman et al., 2008). 

In some cases, we can’t know whether we have the right information item without the ability to 
inspect a larger context in which the item occurs (Teevan et al., 2004) — as we do routinely, for example, 
to locate the “correct” version of a document. 

Finally, a sense of digital place with respect to a personal filing system is closely related to notions 
of familiarity, control and organization (Jones, Phuwanartnurak, Gill & Bruce, 2005). As people organize 
their digital information, this is most likely to be manifest in a personal file system (Boardman & Sasse, 
2004). People express a sense of control, ownership and even pride (e.g., (Boardman & Sasse, 2004), p. 585) 
concerning their files and their organization. 

At the same time, results demonstrate that people do search and, especially, as a method of return 
to email messages. How to account for this observation under the location-is-better hypothesis? 

One rejoinder is to note that a constant stream of incoming email messages where the new 
automatically displace the old is profoundly disruptive for any sense of place. Other interfaces for email 
might induce a greater sense of place (S. Whittaker, Bellotti, & Gwizdka, 2007). And, in a slightly broader 
context, we can see the use of filtering techniques (that automatically sort email into different folders), 
blogs, the “Wall” of Facebook and even the use of multiple email accounts (one for business, one for friends 
and family, one for placing orders...) as variations on an attempt to impose a greater sense of location on 
email. 

Notwithstanding the many useful features of location and the special challenges that “email 
overload” poses for any supporting tool — location-based or otherwise -- we should not overlook a simpler 
explanation of results offered by a “first-impressions” hypothesis. The first-impressions hypothesis is 
essentially an alternate expression for the robust and ubiquitous primacy effect repeatedly observed in 
studies of cognitive psychology (Neisser, 1967). 

Under this hypothesis, the method of re-finding follows from previously successful retrievals of the 
information and, ultimately, from an initial encounter when the information is created or otherwise 
experienced for the first time. People re-find files by navigating through folders because, in a traditional 
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filing system, people often specify the folder location of a file to begin with (even if this is to the desktop 
or a top-level such as “My Documents”). By the same reasoning, people pay special attention to properties 
of an email message such as sender or subject because these properties are used initially to screen and 
prioritize email messages. These same properties are later likely to be recalled during a retrieval attempt 
(Elsweiler et al., 2008). 

People are observed to create email folders for email sent from specific people or pertaining to 
specific topics (Boardman & Sasse, 2004). But, when fast search (and other features such as grouping by 
conversational thread) support a comparable grouping by remembered attributes then why bother with 
folders? Why bother, in particular, when the flood of incoming email messages continues to increase? 

Both hypotheses have their validity. Both carry implications for system design. 

Consider, for example, that the debate over the merits and eventual dominance of search has focused 
largely on the act of re-finding (Barreau & Nardi, 1995; Fertig, Freeman, & Gelernter, 1996; Nardi & 
Barreau, 1997). 

But, in the spirit of the life-cycle emphasis of PIM (Jones, 2007), the first-impressions hypothesis 
prompts a shift backwards in time to the initial encounter with an information item and to the decisions 
made and actions taken during an initial keeping stage of PIM. Support for the tagging and annotation of 
information items remains basic and fragmented. Perhaps, with unified support for tagging and the 
application of searchable annotations, we can finally realize a “placeless” means of keeping (Dourish, 
Edwards, LaMarca, & Salisbury, 1999) that readily couples with a predominantly search-based means of 
re-finding. |! 

On the other hand, following a location-is-better hypothesis, we might continue to explore ways to 
situate email — especially in the context of a larger effort to manage tasks and projects (Jones, Hou, 
Sethanandha, Bi & Gemmell, 2010). 

Suppose there is a best of both worlds? Suppose that these hypotheses aren’t really in opposition 
to each other? Consider, for example, the folder hierarchy that provides a basis for the placement of and 
subsequent navigation back to information items. This same hierarchy might be “flipped” so that folders 
become tags or labels to be applied to the information item (file, email, web page) currently in view. Our 
attention stays (more or less) on this item even as we label or “place” this item for anticipated future uses.. 

Or consider the popularity of direct entry (“auto-complete”) as a method of choice for the return 
to web information. On the one hand, this method can be regarded as a basic search (i.e., search focused 
on the text in addresses for Web pages previously visited). In some cases, the targeted page is directly 
returned through selection of a matching address. But in many other cases, auto-complete is only the first 
portion of a compound re-finding method: Search is a way to “parachute”, we might say, into the right 
region of a Web space. The person then navigates (or possibly searches locally) to reach the targeted item. 
In real situations of information access — web-based or not — the target may, in fact, not be a specific 
information item but rather a fuzzier region of information. 


5 Conclusion 


Consistent with other research, the current study observes a strong preference for folder-based navigation 
to re-find files. However, a different pattern is observed for other forms of information — most notably email, 
for which the inclination to use search as a first method is roughly on par with an aggregate measure that 
includes both inbox scans and folder-based navigation. 

The data reported here were collected in 2009 at a time when local system-level (default), trouble- 
free (mostly) support for fast, index-supported search was present in the computing environments of some 


1 For example, Mac users have the option to add comments to files that can then be seen when browsing these files: 
http://mactips.info/wp-content /uploads/2009/07/mt-comments-view.jpg . These comments can also be searched using the search 
tools such as Spotlight. One participant in the study reported searching on comments as a means of return to files. 
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participants (e.g. those with Windows Vista and the Macintosh OS X) but not others (e.g. those with 
Windows XP). Participant computing environments, even in a relatively small sample of 17, differed in 
many other respects as well including, especially, with respect to email client (MS Outlook, Thunderbird, 
Gmail, AOL, Apple Mail, Yahoo mail) and web browser (IE, FireFox, Safari or, for some participants, a 
combination of these). 

As of this paper’s publication, our computing worlds are already considerably changed from the 
computing worlds of study participants. Had the data been collected more recently or, even, “now”, we’d 
still find ourselves at a comparable remove from the data should we read this paper a few years from “now”. 
If not a separation of time, then the diversity of computing environments might place us at comparable 
distance from the data, e.g.. “if I use or care about system/app/device X where things are done the _ right_ 
way, why should I bother with data gathered for users of system/app/device Y (where things are _clearly_ 
done the _wrong_ way)?”. 

What relevance does the data have to us? If indeed people still prefer to navigate to files even as 
they increasingly use search for email, so what? Maybe a preference for navigation is enduring. Or maybe 
we'll come gradually to use search as a first method of return to personal files. This might happen as 
personal (and personalized) search continues to improve. Or also as support for “search-facilitating” keeping 
methods (e.g. tagging) continues to improve. Or, indeed, as the sheer amount of what is “ours” continues 
to increase — as stored locally on our various devices and, increasingly, in the “cloud”. But again, so what? 

Consider that the value of the study described may be primarily in its choice of methods as these 
follow from two basic tenets of research in personal information management. First, it is important to 
consider a diversity of information forms. Results obtained for a single form such as files (or email or web 
pages) can mislead us in our attempts to generalize or to draw implications for design. Second, activities 
such as re-finding need to be considered in a larger lifecycle context that includes, for example, initial 
encounters with and organization of information that is later targeted in re-finding. 

People exhibit a bewildering diversity of PIM behaviors”. Observed differences might be for any of 
several reasons: personality, experience, education, operation system, device, etc. Pity the poor academic 
researcher who lacks the resources to cover or control for the number of factors that _might_ matter! At 
the same time, if we attempt to abstract away from this diversity we risk taking the “P” out of PIM. 

But we begin to get a handle on diversity as observed variations in PIM behavior (an inclination 
to search, for example) occur within-subjects.. When the same participant exhibits different preferences in 
re-finding method depending upon information form, we can more confidently focus on a comparison of 
these forms, their supporting applications and the circumstances of creation and use for items of different 
forms. Why one way for files and another way for email? What implications can we draw for design? 

The current study, like many other “field studies” in PIM, is exploratory in its efforts to identify 
and begin to understand the reasons for an observed diversity in the methods people use for PIM (and, 
specifically in this study, methods of re-finding). From these explorations we derive implications for the 
design of supporting tools and techniques of PIM. The study reaffirms a point made in other studies (as 
reviewed in the Introduction) that better support for re-finding should be “multi-flavored” in its support 
for techniques of both search and navigation. Moreover, support for re-finding may properly start much 
earlier in the lifecycle of information — with better support for the initial keeping of this information. 


1? Observed differences reported in this paper are for re-finding behavior. Comparable differences have been observed elsewhere in 
keeping behavior (e.g. Jones, Dumais & Bruce, 2002). 
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Abstract 

This paper summarizes some of the themes of my forthcoming book, Indexing it All: The Subject in the 
Age of Documentation, Information, and Data (forthcoming, MIT Press). The paper presents research 
on the history and theory of the modern documentary tradition in the 20" and into the 21* centuries, 
which it views as an episteme with three dominant moments: European Documentation, Information 
Science and Data Science. In this paper, European Documentation, citation analysis, social computing, 
android robotics, and social big data are discussed as cases in the dialectical movement of this tradition. 
Each stage of this development represents higher levels in the subsumption of human agents and texts 
within increasingly abstract documentary forms of representation. Documentary indexing and 
indexicality have been major and increasing sources for the social positioning of persons in modernity, 
with consequences for personal and social psychologies, politics, and for critique and judgment. The story 
of the modern documentary tradition is a story of the role of indexing (personal, social, and textual 
positioning through documentary techniques and technologies) and indexicality (the modes of 
documentary citation and reference that result in such), and how this has shaped and continues to shape 
what Suzanne Briet termed, “homo documentator” (Briet, 2006). 
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1 Introduction 


“This definition [of documentation, as evidence] has often been countered by linguists and 
philosophers, who are necessarily infatuated with minutia and logic. Thanks to their analysis of the 
content of this idea, one can propose here a definition, which may be, at the present time, the most 
accurate, but is also the most abstract, and thus, the least accessible: "all concrete or symbolic 
indexical sign [indice], preserved or recorded toward the ends of representing, of reconstituting, or 
of proving a physical or intellectual phenomenon.” 


“Homo documentator’ is born out of new conditions of research and technique [technique].” 
— Suzanne Briet, What is Documentation? (Briet, 2006) 


It is often assumed that information science and data science introduce new phases in the history of 
information, replacing the book form and introducing new, more flexible, global information services and 
communication into a world of book and documentary modernism. While these are certainly true claims, 
we would be amiss, however, not to more fully recognize and even bring into critique the continuity of this 
progress. For the story of the documentary tradition, particularly in the last century and into the current 
one is not that of documentation as one tradition and information and data as other traditions, but rather, 


of an historical continuity in techniques, methods, and the ideology of documentary processes as socio- 
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technical norms. Indeed, what we are witnessing is not the end of the documentary tradition, but rather, 
the qualitative intensification of this tradition and the socio-technical transformation of its tools, from that 
of being embedded in transcendental professional organizations (libraries, classification, cataloging, and 
librarians) to infrastructural mediators of everyday life (link-analysis ranking, social network algorithms, 
recommender systems). 

Indeed, from one angle — which will be the method of this paper — we may view this progress as 
the progress of a dialectic, namely, as a aufheben, a sublation (or let us use the Marxian term, subsumption, 
since we are largely dealing with a material ‘gathering up’ and ‘uplifting’ [aufheben] into virtual tools for 
mediation). This subsumptive process of the modern documentary tradition (which in this paper I discuss 
roughly from the beginning of the 20" century to our current time) is, from a certain historical perspective, 
the dominant Zeitgeist of our age. Thus, the critical perspective that we can offer in this paper is that of 
marking this moment, as a totalizing moment, but one that leaves remainders, which are increasingly — in 
not only epistemological and technical senses, but even more importantly, in political and social senses, left 
out of the literal counting, processing, and indexing of knowledge, technology, politics, and professional and 
everyday social and cultural being. To open up such a critical moment follows the path of Enlightenment 
critique, in the sense that it opens up an historical caesura within what is celebrated as the given (Foucault, 
1984; Kant, 2009). 

What follows is an epistemological-historical overview of five historical socio-technical cases 
(European documentation, citation analysis, social computing, android robotics, and social big data), which 
fit within three moments of the documentary tradition. This tradition is treated as an ‘episteme’ (to use 
Foucault’s term for epochs of dominant socio-technical devices) and these smaller moments and their 
sciences (documentation, information, and data sciences) may be seen as smaller epistemes within this 
tradition. This paper theorizes the subsumption of the personal and textual objects within a documentary 
episteme, eventually resulting in the reduction of both to being mutually conjoined data points within 
surveillance and predictive algorithms and socio-technical modes of governmentality. 

In each of the epistemological-historical stages that I present, documentary techniques become more 
embedded, not only as mediating inter-subjective transactions, but also in mediating self-identity. This 
double mediation constitutes a self-reinforcing cycle of treating others and treating one’s self as known or 
knowable identities — as increasingly reductive, but also increasingly conjoined representations — through 
documentary mediation. Not only documentary techniques, their logical algorithms, and their indexes lead 
to this, but also more recent historical trends in political economy in modernity and in neoliberalism, which 
view the self as an entity of psychological and social positioning within markets and trends. (As I will 
explain, this eventually translates to viewing selves as data points within known parameters over time and 
extrapolating or interpolating further characteristics from such for both social big data and personal 
assistant application.) 

Hegelian dialectics constitutes one analytical means by which we may chart not only the 
documentary objectification of subjects and the ‘subjectification’ of documentary objects, but also their 
eventually mutual mediation as documentary representations or data points, not only statically, but over 
time, as well. This ‘total subsumption’ of all beings and natural and social objects constitutes the historical 
movement of modernity in the form of the documentary tradition. 

The argument that I will be making is that the technologies and techniques, along with the methods 
and the organizations, of the modern documentary tradition, in dialectical interaction with the ideological 
sphere of late capital, work toward the increasing documentary indexing and the social positioning of 
individuals as subjects of documentary mediation. Increasingly as well, as the quotes from Suzanne Briet in 
the beginning of this article suggest, social life and personal value are founded upon documentary indexing 
and the indexicality of social positioning following this. The story of the modern documentary tradition is 
a history of the role of indexing (personal, social, and textual positioning through documentary techniques 
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and technologies) and indexicality (the modes of documentary citation and reference that result in such), 
and how this has shaped and continues to shape what Suzanne Briet termed, “homo documentator” (Briet, 
2006). 

Following social ‘positioning theory’ in Rom Harré’s work (Harré, 1989), I take the term ‘self’ to 
refer to acts based on hypothetical sets of skill based potentialities and I take the term ‘person’ to refer to 
acts based on socially and culturally recognized rules and roles. The expressions of the self, as with other 
higher order living beings, are done through the affordances of cultural, social and physical materials (e.g., 
language or other semantic entities, rules for their socially meaningful deployment, and the body). 

If I am correct in stating that the documentary tradition has increasingly been transforming ‘selves’ 
into ‘persons’ in the manner that these terms are defined above, and that modernity is progressing in its 
increasing normativization of the self through the mediation of information and communication technologies, 
then this represents a great change in the Western conception of the self, at least since the Enlightenment. 
This would have political stakes, as well, since Enlightenment thought takes the self as a central agency in 
the social and even the governmental state, in so far as the self is seen as the driver of historical change 
and stability through concepts of freedom, foremost of course, freedom of expression and with this, freedom 
of choosing. Such ‘innate’ properties of the self are taken as natural rights, belonging to the very concept 
of individuals as (generally) persons, and are encoded in state constitutions and other formal laws of the 
18" century (the U.S. constitution and the French Declaration of the Rights of Man and of the Citizen). 
As we will return to at the end of this article, Kant argued (Kant, 2009) that such a conception of self 
marked a break from dogmatic and authoritarian philosophy and political governance. A return to a stronger 
sense of socially normative psychology as structuring the self, then, would bring the Western norm of 
personal psychology closer to that in non-Western cultures, particularly in Asia. The ‘bridging’ of these 
cultural psychologies by documentary techniques and technologies is reinforced by shared, ‘global,’ social 
norms in political and governmental economy (e.g., neoliberal notions of markets and selves as competitive 
identities in those markets (Foucault, 2008)) and complementary forms of state capitalism and the 
surveillance states that go along with maintaining such global orders of class. The surveillance state now 
spans democratic and authoritarian societies (though of course this doesn’t mean that there aren’t 
differences involved). ‘Global communication’ by means of ICTs then involves not only 
technical/technological mediations, but also socio-cultural mediations, which have been accomplished by 
neoliberal class and identity formations across different cultural horizons. 

In the conclusion to this paper I would like to discuss some of the issues that remain, despite this 
subsumption, involving the status of the self and issues of critique that remain as part of the psychological 
and political lineage of the Western Enlightenment. The story of the documentary ‘indexing’ of selves as 
persons through computer mediation, not only across technical, but also across cultural and social 
‘platforms’ as mediators of expression and the social construction of communicational infrastructures has 
hardly been discussed in the literature. 


2 The modern documentary mediation of others 


Let us start with a rather startling quote from the father of European documentation, Paul Otlet, in 1903. 
It is often assumed that the ‘information age’ is rather new and that the issues that I raised above are 
rather new, as well, but in the quote that we will examine documentary mediation is seen as restructuring 
that old trope of the book-friend, so beloved in the modern age of the book, and in this instrumental 
mediation of texts a restructuring of friendship itself takes place. Here, we must remember that throughout 
Otlet’s work the concept of ‘the book’ refers to the material specificity of books, most particularly, and to 
documents more generally. Otlet’s quote, then, is well on the way to recharacterizing not only books, but 
all forms of documentation, as well as friendship itself, as instrumental tools for serving ‘information needs.’ 
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The quote is from a 1903 publication, “Les sciences des bibliographiques et la documentation,” 
(translated by W. Boyd Rayward as “The Science of Bibliography and Documentation”), and it was 
published in the Bulletin of the International Institute of Bibliography (number 8), (which would later 
become the International Federation for Information and Documentation, founded by Otlet and his 
colleague Henri La Fontaine (Rayward, 1994)): 


Today, there exist collections of books comprising more than two million volumes and whose annual 
accessions are more than one hundred thousand volumes. They have had to come to grips with 
quite new problems arising, on the one hand, from difficulties of storage, classification and 
circulation of such tremendous masses of materials situated in the centres of large cities, and on 
the other hand, from new ideas within the research community about what it should be able to 
gain from such resources. Once, one read; today one refers to, checks through, skims. Vita brevis 
ars longa! There is too much to read; the times are wrong; the trend is no longer slavishly to follow 
the author through the maze of a personal plan which he has outlined for himself and which, in 
vain, he attempts to impose on those who read him. 


Works are referred to, that is to say, one turns to them to ask for a reply to very precise, specialized 
questions. The reply found, one parts company, ungratefully no doubt but certainly for a thousand 
good reasons, from the obliging friend who has just given such good service. It rarely happens that 
an adequate reply is found in a single book and that it is not necessary to obtain such a reply from 
a combination of partial answers provided by a variety of works. Thus arises the necessity of having 
available great quantities of works, as many as possible; thus, also, the obligation of not 
systematically eliminating any work from book collections because little importance or value is 
attributed to it. Who can make a pronouncement on the usefulness or uselessness of a document 
when so many interpretations of the same text are possible, when so many former truths are 
recognized as wrong today, when so many accepted facts have been modified by more recent 
discoveries; when, in the present anarchy of intellectual production, so few questions have been 
dealt with exhaustively by a single author; and when, so often, it is necessary to be content with a 


half-truth or run the risk of remaining in a state of complete ignorance? 


The number of works which libraries contain increases the need for documentation, just as organs 
develop functions. This need, in its turn, acts strongly on the necessary enlargement of collections 
of books. But this process cannot be confined to the realm of large libraries. It spreads beyond them 
through the diffusion of the works themselves. More reliable, better arranged, more up-to-date 
books can be produced because of the improved bibliographical apparatus of these libraries. Such 
books become models that, naturally, intellectual workers, who otherwise only have access to 
inferior bibliographical equipment, wish to imitate and surpass. Such books lead us to pose very 
clearly the problem of documentation in relation to libraries of the second rank. 


(Otlet, 1990) 


In this quote, one can clearly see the instrumentalization of 19 century hermeneutics (and psychology) of 
reading, as well as the instrumentalization of friendship. If friends could be said to be an ‘open book’ to 
their partner if only because the partner had the patience and courtesy of reading such, then Otlet’s friend 
is that of an information deliverer to the needs of the user-friend. 

What is important here is the shift in the notion of what a text is, as well as what a friend is. The 
mutual opening of friends to one another, and correspondingly, the lengthy opening of a text to the reading 
of the reader is transformed in Otlet’s quote to an exchange of meanings, corresponding to the a priori 
needs of a user. Otlet’s quote challenges, in particular, 19 century German humanism in its analogical 
reduction of textual and psychological hermeneutics, reframing this analogy within instrumentalist and 
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positivist epistemology and morality. Moreover, Otlet’s book-friend universe is rather promiscuous, rather 
than discrete, corresponding to large libraries and, as Otlet himself notes to the production of other 
documents themselves through bibliographical or, more broadly, documentary devices and institutions 
(“But this process cannot be confined to the realm of large libraries”). The book, as the friend, is not seen 
as a singularity that is encountered, as a self that has its own unique alterity in relation to the subject, but 
rather, the book-friend is an ‘information source’ corresponding to the user’s own needs. 

But, are these ‘information needs,’ as we now call such, the subject’s own, and if every subject is 
an information object for another subject, then how is the inquiring subject’s own sense of being configured 
in regard to this need? For the information ecology that the subject is immersed in cannot simply be his or 
her own, with his or her own ‘private language’ (as this term is understood in the philosophy of mind, 
referring to the possibility of there being completely idiosyncratic languages that a self possesses, not 
understandable by any other being). If an information need is simply my own, then there would be no 
possibility for the correspondence of information subjects and objects, people and documents, that is, for 
the ‘fulfilling’ of information needs. 

On the one hand, the documentary epistemology that stretches from Otlet through the later 20" 
century Library and Information Science (LIS) discourse of ‘information needs’ and their fulfillment 
acknowledges personal needs, but on the other hand, it locates their fulfillment in documentary universes 
within which ‘aboutness’ or ‘information’ (what in the documentary tradition is seen as the ‘evidence’ 
(Buckland, 1991, 1997) or ‘content’ of a text or other information object) have been identified by 
bibliographic indicators and other metalanguage and metadata (classification numbers, cataloging terms, 
descriptors, etc.). Indeed, as any reference librarian can attest, the ‘needs’ of the user become ‘clearer’ once 
some material on a topic has been given and the person can further work through the library collection and 
its tools. This example, alone, should tell us that ‘information needs’ and their fulfillment are not a matter 
of the incomplete correspondence of an ‘unconscious,’ poorly formed, or confused idea in a personal mind 
and that of a documentary /informational object (or in most cases, its metadata representation), but rather, 
this phenomenon is a product of people doing information seeking tasks through slowly matching up their 
language of what they need to do with the language of material in a library or other setting. In this, fulfilling 
information needs is similar to looking for a packaged item in a foreign language supermarket and choosing 
the item that most closely corresponds to what one thinks one may be looking for. Often, one is right, not 
because there is an essential correspondence between the seeker’s idea and the item, but because there is a 
pragmatic agreement that, for example, yes, indeed, this item tastes like what I call, for example, ‘figs’ in 
English. 

Indeed, at least in the domain of relatively unknown topics, this searching for the object involves 
the searching first of all for the subject’s own needs in the context of the documentary system. This latter 
is a search for vocabulary (linguistic, visual, etc.). This was the task of Nicholas J. Belkin’s ASK (Anomalous 
State of Knowledge) systems, which helped gave rise to the ‘user’ or ‘cognitive’ turn in Library and 
Information Science (as distinct from a document retrieval perspective). This is a great and very important 
insight, which some of LIS’s information seeking literature, as well as critics (including some of my past 
work) of ‘user studies’ in LIS largely misunderstood. Information needs do not belong to the subject alone, 
but to the context surrounding both the user and the documents, and it is within the universe of documents 
and those surrogates of ‘information’ that are said to belong to them that the subject must position him or 
herself within and so find their needs in the world of their search. Within ordinary language, this universe 
can be both that of particular discourses and as broad as all understandable language by speakers. Within 
documentary systems, it is often a subset of language, identified as important by either human or automated 
indexes. The situation of the user in relation to documents is not that of a classic cognitivist agent (in 
classic cognitive science, artificial intelligence, etc.), but rather, something akin to the Lacanian subject 
within the field of language. The human subject finds itself as a subject of desire (or here, need) by its 
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location among documentary topics (also called, ‘subjects’), and it is this positioning that in more current 
times gives to the person a documentary subjectivity that he or she deploys as an ‘online identity’ and 
which guides the person’s further searches through their use of recursive algorithms. 


3 Citation indexing and analysis 


If we wish to take an historical perspective, we might see the next step beyond Otlet’s instrumentalization 
of the document-book, which results in the institutionalization of the subject’s desires as needs in 
documentary systems, as the further documentary systemization of the subject as a moment in documentary 
systems. This next step involves the further abstraction of Otlet’s transformation of individual people and 
texts (as users and documents) into both becoming ‘information.’ This corresponds to the post-war 
transformation of documentation studies into information science and it begins with the documentary 
indexing of social position with automated citation indexing. 

This transformation took place through both a technological and an epistemological transformation, 
particularly in the 1960s and 1970s when documentation retrieval gave way to information retrieval. The 
evolution of documentary retrieval to information retrieval was idealist, of course, in so far as it premised 
the ability to retrieve items based on the aboutness of texts, but it was also a natural consequence of relying 
on technical systems that attempted to represent documentary items by abstractions of what was seen as 
their most important ‘content’ (i.e., their ‘information,’ as understood by metalanguage, metadata, and 
documentary abstracts). These technical systems began using what were and still are classical 
transcendental or structural ontologies and taxonomies that were information professional tools 
(classification, cataloging terms, etc.) as, now, infrastructural mediators for retrieval. These mediators have 
become increasingly more important as measuring devices for personal value in professional (e.g., scientific 
and academic) and general (e.g., online communities) social systems, but they have also become more 
omnipotent and invisible algorithmic and indexical mediators in everyday social and personal being and 
worth. 

An important moment in the post-war infrastructuralization of documentary techniques, 
technologies, and institutions occurred in scholarly citation indexing and analysis, best represented by 
Eugene Garfield’s development of citation indexing systems based on, but computationally enabling, the 
citation analysis systems of earlier paper forms. The Institute for Scientific Information (ISI) (which Garfield 
founded in 1960) and its subsequent Science, Social Science, and Arts and Humanities Citation Indexes 
(more recently joined by Scopus and Google Scholar) has had a strong influence upon determining scholarly 
behavior through its citation indexing and ranking of scholars and journals, particularly in the natural and 
the social sciences. 

Scholarly citation indexing and the types of scholarly behavioral analyses that they purport to 
represent that follow from these are possible because they ‘objectively’ inscribe persons as informational 
subjects/objects by evidentiary or documentary systems. A person is identified as an “author’ and is 
bibliometrically ranked as such within whatever class of documents and documentary topics that the system 
indexes. The behaviors that such systems purport to track are actually behaviors that longitudinally are 
self-defining through the interaction of users with the algorithms and the indexing parameters of the system 
itself (Day, 2013). Indexing and search algorithms form a cybernetic system of control upon expressions of 
information needs and information identification, which may lead to the redefining of the human agent as 
a documentary subject. The ideological norms for fields and sciences entraps both texts and then human 
agents within object and subject documentary (i.e., representational) identifications, in turn, historically 
strengthening the ideological parameters that define the range and ranking of the relation of people and 
texts as documents. 

Citation analyses show scholarly behavior within the confines of the parameters of the algorithms 
and indexes that measure them, and so, like many social science analyses, are somewhat tautological in 
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what they show. These bibliographic/bibliometric structures, technologies, and techniques offer certain 
understandings of people and texts as authors, documents, citation, etc. For example, in most systems, 
bibliographic references are counted, for example, but not acknowledgments (Cronin, 1995), some journals 
are not indexed, chapters of multi-authored books are often not counted, and so forth. (The arts and 
humanities remain poorly indexed by citation systems, and so citation analyses methods, such as the h- 
index, are not used as often or with as much importance in the humanities as compared to the social and 
natural sciences in North American academic evaluations, at least.) Author rankings, like journal rankings, 
may be self-reinforcing over time, not due to being effects of some quasi-natural ‘laws’ of bibliometric 
universes (e.g., Bradford’s law of impact, Lotka’s law of productivity), but rather, due to sociological 
tendencies and the privileging of certain of these tendencies over others by the value systems that must be 
implicit in citation indexes in order for them to be seen as not just technically, but socially, valuable. Indeed, 
citation analysis measures human behavior and bibliometric behavior as a consequence of this, but the 
human behavior that is measured over time becomes a consequence of the bibliometric/documentary 
systems as these systems are given central governance controls over social environments (for example, 
through academic evaluation procedures that prioritize the results of citation analysis tools). 


4 Social Computing 


The technical debt of Google PageRank and other web link analysis systems to Eugene Garfield’s work in 
citation indexing is well established. Social Computing — encompassing not just web link analysis systems, 
but recommender systems such as Amazon’s search and social networking algorithms — extend the means 
and social logic of citation analysis to larger populations of users. Through logical mediators, metadata, and 
text analysis, webpages are ranked, ‘like’ works are identified, and ‘friends’ are found through other ‘friends.’ 
The basic means for documentary mediation — organizing knowledge through logical processes of difference 
and identity — are carried out for the user at the time of the search, mediating the need at the time (or 
increasingly, with personal communication devices before) an ‘information need’ is known by the user. 
Recursive algorithms introduce a winnowing of choices corresponding to the user’s needs, a process that 
‘gathers up’ past searches in future searches (Thomas, 2011, 2012). 

This ‘calling’ or “interpellation” (Althusser, 2001) of the user by a universe of documents that are 
pre-identified by recursive algorithms, and previously, by link analysis algorithms and alike, constitutes the 
documentary naming of a person as a certain type of information user. The user is called into presence 
within the socio-technical documentary system by an identification of information needs out of popularly 
or professionally measured choices and out of his or her past choices. “Information needs” do not arise 
simply out of personal interests, but rather, ‘interests’ are themselves interests in things known or knowable, 
and so, interests arise out of discursive, or more broadly stated, ideological, spheres. Documentary 
positioning — indexing — of both documentary objects and subjects, then, are ‘political’ in so far as 
‘needing’ information is that of positioning one’s self within informing collections of social materials through 
the use of language and technological/technical search infrastructures, which enfold within their dialectic 
historical trends and social norms. 

Louis Althusser’s theorization of “interpellation” (Althusser, 2001), as the calling of the subject by 
the law. is useful to recall here, particularly if we understand this calling in terms of social indexing or 
subject positioning through documentary systems. This interpellation is aided by the interpolation of data 
within social big data systems, so that predictive functions help define the subject not only in real time but 
in future times. Predictive algorithms work with past and present behaviors in order to define and guide an 
‘information need’ to specific documentary objects and ‘like’ objects. Social computing is, thus, the 
computational mediation of social and personal psychologies by logical functions of identity and difference, 
recursively leading to identity formations and identity constructions that follow existing social networks 
and the past searching habits of both the user and the larger social and cultural whole. (And in so far as 
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social psychology is understood in terms of commodity markets, social computing acts as the mediator 
between commodities and consumers/producers, both being understood as mutually joined expressions of 
‘rational markets’ and personal ‘choices.’) Social computing is the expansion of citation indexing to broader 
and more general online environments. It mediates human interaction by technical infrastructures of 
recommender systems, Boolean functions, recursive algorithms, and natural language processing techniques 
for deriving semantic content and inferences. The increasing presence of these documentary algorithms and 
tools at the time of search or even before have made them, in our increasingly media concentrated lives, 
not only mediators, but creators, of our reality. In as much as they have joined us, they have also divided 
us into smaller silos of social care and responsibility. Our documentary devices (and their computational 
hardware) have increasingly become us, as we have become them; those increasingly invisible funnels of our 
concerns and our dreams, and not in the minority, of our narcissisms. As we will now see, some have 
attempted to give these “screen memories” (to borrow a term from Freud) a physical form, as well. 


5 Android Robotics 


Modern documentation begins with a materialist division between persons as readers or viewers of signs 
and of texts as bodies of signs as meaningful inscriptions. Modern documentation begins by reducing this 
relationship to a material relationship of retrieval, but as we saw with the Otlet quote that began this 
article, this materialism is immediately inverted by an even stronger idealization: the text now understood 
as a container for ‘information.’ This information, as a form of reference or ‘evidence,’ is the basis for the 
modern notion of the document. The task of modern documentary professions and technologies is not to 
unite a person with a text, but rather, a user with a document, though as we have seen it is the documentary 
or ‘informational’ universe of the document (derived through abstractions, representations, and fragments 
of the text) that structures user ‘information needs.’ 

Even if we were to understand the historical progress of the modern documentary tradition in the 
20" and into the 21* centuries as being chiefly characterized by the subject becoming a documentary object, 
we would only see half the picture. The other half, though, is still very much in the process of evolving: 
that of the becoming subject of documentary objects. It begins, of course, with human projections upon 
machines. 

Industrial modernity is made up of this ‘becoming’: mechanical levers, machines built on models of 
the body and the mind, machines built to mimic human organizations or to augment such. While 
documentary retrieval has sought to put the ‘content’ of documents (‘information’) into the minds of users, 
‘strong AI’ has sought to put minds into the coding of machines. Android robotics represents either a stage 
in this latter, toward a total simulacrum of humans, or, it may be seen as part of what we might call a 
‘communicative AI’ where human psychological projections upon the android form plays an important role 
in reading intentions and meaning upon the android’s expressions, as well as providing real world training 
sets for the machine learning of human cognitive and affective expressions. 

The first step in either of these communicative AI options would be attempts to overcome ‘the 
uncanny valley’ (MacDorman, 2006; Mori, 2012) that prevents androids from being understood as humans 
by the persons interacting with the machines. A more natural communicative circuit may be desirable, if 
only to condition the machine to ‘learn’ based on more humanly natural training sets with it. One would 
think that mimetic transference would be easier with an android than with a text or even with a humanoid 
or other non-android robot, but this is often not the case. Just as only certain types of texts produce ‘eerie’ 
affects, such as ghost stories, stories featuring doppelganger and alike, etc. (Freud, 1959), people easily pick 
up signs of something not being right with one another, and so read into today’s rather primitive androids 
symptoms of hostility, illness, and even death (Mori, 2012), as well, of course, as bemusement regarding 
their mechanistic novelty. Just as uncanny narratives produce uncanny affects by deviating from scripts 
within realist frames of narrative, androids produce such affects by performative flaws within real 
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interactions with humans. Their very appearance as humans makes them susceptible to producing uncanny 
affects. Even scripted performances, such as theater performances (Oh, 2010), which have traditionally been 
understood as simulacrum, are not immune to this appearance of the uncanny. The ‘becoming subject’ of 
the document is very difficult to achieve, especially when such performances are not just through 
disembodied AI agents (such as call answering android voices), but rather, through physical robots that 
have the initial appearance of humans beings. 

Nonetheless, we live in an era when robotics increasingly is coming to be incorporated into and 
extend human subjectivity. Robotic call centers, human exoskeletons, voice controlled robots, and physical 
implantations are increasingly being not only programmed to serve users, but are trained through user 
actions for performing mental and physical expressions. Together, humans and machines are enfolded within 
one another in the performance of normative acts, which are increasingly precise and singular, despite their 
normativity. Digital machines, as compared to analogue machines, are better designed to perform within 
parameters of possible actions in conjunction with human intentions. The construction of inter-subjective 
documents and their social and recursive inclusion in further expressive acts is not only a characteristic of 
human to human mediated communication, but also human to computer communication. 


6 Social Big Data and Neoliberal Governmentality 


One of the most important aspects of social computing now is the use of large data sets for predictive ends. 
Particularly in the case of emotive or in this sense ‘aesthetic’ actions (i.e., emotive, rather than logical, 
senses of ‘liking’; fashion and shopping trends; tastes, etc. — (Bollen, 2011), technical interpolation and 
extrapolation, whereby documentary subjects and objects are brought together as conjoined data points of 
interest and longitudinal inferences are derived from this, constitutes the documentary metonymic 
compliment of ideological interpellation and social positioning in an age of big data. One is positioned as 
data, conjoined to documentary objects and that mutual objectivity as conjoined data is recursively read 
back unto the self for future searches and for future self-presentation to interested others. One is called or 
interpellated by means of known parameters, but that inference through live, historical, or social surveillance 
and recording (‘tracking’) then gives rise to other inferences, whereby one extrapolates new possible 
conditions, and so forth. 

Social big data, particularly when combined with contemporary neoliberalism, which stresses self- 
positioning and competition within markets, leads to a new form of governmentality (Rouvroy, 2013). This 
new form of governmentality is one of control and self-control by large and recursive data, operating as self- 
reinforcing interpretive and behavioral command and control centers of cybernetic governance. As online 
mediated life becomes more ubiquitous, total, and common in the midst of a divided and isolated modernity, 
this new form of personal and social mediation uses documentary fragments to give persons and texts 
identity, expression, and value from their mutual positioning in parametric data fields. Life is expressed as, 
rather than simply through, each of us, and this representation slowly becomes us, singularly and as a 
whole. This is the documentary spirit, now given further force by neoliberalism and the collapse of not only 
the welfare state, but by the increasingly obvious end of employment by these same and similar tools of 


informatics. 


7 Critique 


It is often assumed that documentation is fundamental to critique, in so far as it provides evidence for states 
of the world. But, evidence is always evidence of something, and so itself derives from discourse. In the case 
of documentarily mediated knowledge, however, discourse — or in a more general sense, the organization 
of ideas, ‘ideology’ — is not the only source of the norms that evidence conforms to. There is a 
technologically mediated component, as well. The history of the modern documentary tradition in the 20% 
and now the 21* centuries has been one of the increasing dialectical interaction of ideological norms and 
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technical/technological tools toward the construction of documents and users as ‘evidence’ of these socio- 
technical systems. 

To be useful and successful, these information and communication technology and techniques must 
help users get information that is useful, that makes sense. Creating such conditions is the purpose of 
documentary devices and services. Through documentary devices and services documents and their 
representational surrogates are organized and presented to users in useful manners. On the one hand, such 
organization must follow the logical processes of information organization and processing, as such occurs in 
both traditional documentary techniques and computational algorithms. (For example, search queries today 
are still largely keyword formulated rather than natural language queries; in cases where users do enter 
natural language queries, then the search algorithm may initially parse the queries into stop words and then 
calculate the relevance of the remaining key terms.) On the other hand, such organization must follow 
norms of human expectations—both for groups and for individuals. 

In sum, documentary techniques and technologies employ ideological and technical operations, in 
dialectical relation with each other. The Dewey Decimal Classification scheme, for example, enfolded 
normative 19 century Western European cultural understandings of the major headings and divisions of 
general knowledge. Today, algorithms enfold group or individual harvested past searches in order to serve 
user needs. The logical order of the DDC and the order of subdivisions for cataloging are intuitive to anyone 
raised in the Western cultural tradition and educated in rhetorical procedures for argument. Computational 
processes built upon a logic of identity and difference, syllogisms and logical inferences, translate human 
affects of friendship networks and ‘liking’ into a computer mediated simulacrum, which then, cybernetically, 
may come to condition (or displace) face-to-face friendships. 

As I have suggested in this paper, the socio-technical dialectic is historical in its increasing 
mediation of not only information and communication through representational entities, but in the 
formation of identities of self and other as representational entities. As we live so much of our lives online 
and as so much of our social and professional lives are conducted and evaluated by our documentary 
production, analysis, and ranking, documentary mediation becomes more and more the infrastructure 
through which we understand our lives and those of others. We come to live a represented life — a life of 
‘digital experience, eventually, as represented data within normative parameters of social and cultural 
measure. In an economic system that is both increasingly informational and omnipotent within everyday 
life, and which is, partly because of such technologies, an economy of scarcity for the vast majority of people 
in even the ‘developed’ economies of the world, what remains outside of this counts, literally, for little if 
anything, though of course much does exist outside of it, not least, the worlds of affects, intentions and 
experience in more ‘general life’ outside of the documentary systems. We enjoy following our friends on 
Facebook, but we may be deeply touched in meeting them for real. We drag the ‘information’ that we find 
on the Internet into our lives and put it to the test in a more general ‘real life.’ We post on the Internet 
our experiences and photographs that occurred outside of the Internet, per se. There remain excesses and 
reserves to the more restrictive economies of online life, which is the source for much of the energy and 
meaning of the representations found there. But, some believe that we are increasingly losing sight of this 
‘extra’ to online and digital life and why it gives joy and optimism to such (Turkle, 2011). 

One really crucial question for affect, judgment, and knowledge at such an historical moment is 
what is the status of critique within the increasing presence of a documentary infrastructure mediating all 
forms of life? 

The notion of critique that I have in mind arises in the 18" century as part of a very specific 
moment in the West — the moment of the Enlightenment. This doesn’t mean that critique doesn’t occur 
or hasn’t occurred in other cultural or historical environments or moments, but rather it was 
institutionalized in the Western Enlightenment as a very specific set of assumptions about people and their 
rights to expression. Basically, people and small groups of people became understood as singular agents — 
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powers — of great and diverse potential, whose source didn’t lie in religious and monarchical orders and 
states. Public institutions arose to foster this, in commerce, knowledge (schools and libraries), organized 
sport, and many other institutions during the late 18", 19" and 20" centuries. Though later critiques arose 
to check some of these powers when, as individual or collective, they hindered the inherent powers of 
individuals overall (such as the communist critique of Marx), even here the underlying theme was that of 
the right of persons to express themselves and to overturn not only established institutions and dogmatisms, 
but also personal habits and identities. Constitutions were written to embody such rights as foundational 
or ‘inalienable’ to all human beings, and the beginning point for such was taken to be intellectual freedom 
demonstrated in the right of free speech. 

The usual framing of the documentary tradition is to see the documentary tradition as aiding the 
Enlightenment tradition by grounding truth claims in evidence. But, as I have been suggesting, the 
documentary representation of objects and persons as evidence and evidential claims, as is frequent in the 
modern documentary and neo-documentary tradition in LIS, can obscure, as well, the rights of persons and 
texts to critique and understanding. The Enlightenment self is not that of a representational identity, but 
rather, it is a set of potentialities of skills gathered by experience and deployed singularly — either by 
persons or by groups of persons — in regard to situations. ‘Experience,’ in such, is understood as patterns, 
models, and affects for analogical, but non-exact, iteration, combined with other past experiences in given 
situations according to judgment. Judgment, if it is rational, utilizes logical inference, but as a means, not 
as an end. Judgment is analogical, sometimes even allegorical, and is always experimental. For this reason, 
the wise person sees their judgments as provisional and as part of a conversation. One’s best judgment may 
not always be a correct judgment, because the logical is not the rational, theoretical knowledge is not 
practical knowledge. But by taking account of other’s experiences better judgments are usually better made. 

Because of the importance of judgment in all living lives, both more information is reasonable and 
greater dialogue and difference is preferable. What the best amount of either may be in many situations is 
debatable. The best that we can do is to have a broad knowledge and be patient with ours and other’s 
judgments and continually interrogate ourselves toward this. It is true that necessity breed action, but 
patience and listening breeds the judgment that is necessary. 

But particularly in habitual or dogmatic situations, not the least in regard to the selfs own 
relationship to his or her own opinion, prior to judgment lies the necessity to first open up the space of 
judgment by critique. If critique is threatened by the recursive closing of singularity and experience, then 
judgment becomes increasingly ideological and even formulaic. (Though, the more ideological and formulaic 
it is, the easier it is to claim increasing levels of relevance in information retrieval.) 

Critique is threatened today on many levels, from the surveillance and governance of persons by 
states, governments, and corporate bodies, to instrumentalism in education, to the pressures of competition, 
money, and time in neoliberal scarcity, and to shrinking publication markets and the consolidation of media. 
In this article I have given several examples of an historical trend in documentary modernism toward the 
increasing representational mediation of texts and persons as information objects and as data points. Such 
mediation increasingly is occurring at the level of infrastructures of everyday life as well as scholarly 
knowledge. While there are very many reasons to celebrate the ‘information age’ and to claim that 
documents lie at the heart of modern knowledge and civilization, there is also a darker side to the story 
that I have tried to suggest in this article. A fuller account is usually preferable to a narrower account, and 
the simple purpose of this article has been to give the reader something to consider about the relation of 
modern information and communication techniques and technologies to lived experience. 
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Abstract 

This paper examines how young adults process information related to privacy, and how that affects their 
attitude towards behavioral targeted advertising. Differences between computer novices and experts were 
examined based on the Elaboration Likelihood Model (Petty & Cacioppo, 1984), which argues that people 
who have the ability to process information do so differently than those who do not have the ability. 
Consistent with the theory, we found that computer novices were relying on peripheral cues to process 
information related to security due to their lack of knowledge. We also identified an “uncanny valley” 
effect where people liked customization of targeted advertisements, but then became uncomfortable if the 
advertisements seemed to know too much of their past behavior until the suggestions were perfectly 
aligned with their interests. 
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1 Introduction 

Behavioral targeting, or targeted advertising, is a type of personalized advertising in which advertisers track 
and monitor the web-browsing behavior of individual consumers across multiple websites in order to provide 
consumers with advertisements related to their previous online activities (McDonald & Cranor, 2009). Prior 
studies on personalized advertising have shown that consumers generally think advertisers violate their 
privacy (Yu & Cude, 2009) and that they generally avoid online advertising (e.g., Jin & Villegas, 2006). A 
Pew survey of 2,000 adults in the U.S. showed that 68% were “not okay” with targeted ads while 28% said 
that they are “okay” with it (Purcell, Brenner, & Rainie, 2012). 

This study examines people’s attitudes towards targeted ads, focusing on how users process 
information presented on websites, and how that affects their attitude related to targeted advertising. The 
Elaboration Likelihood Model (ELM; Petty & Cacioppo, 1984) is a theoretical framework that explains how 
people process information. According to ELM, there are two routes of information processing: the central 
route and the peripheral route. These two separate routes differ in the amount of thoughtful processing, or 
elaboration. The central route is a rational process where people carefully consider the information that is 
presented and base their judgments on the strength of the arguments. When information is processed 
centrally, compelling arguments will have persuasive power, while weak arguments will be counter-argued 
or resisted. The peripheral route is taken when individuals do not diligently consider the pros and cons but 
rather use minor factors rather than the quality of the information, to form their attitude. ELM states that 
for people to process information in a rational manner, they must both have the ability, and the motivation 
to understand it. The original theory focuses primarily on text messages; we were thus interested in how 
this theory would apply in the context of visual cues on websites that are related to behavioral targeting. 
Our research question is thus looking at the relationship between users’ ability to process information (based 
on their computer expertise) and their online behavior. 
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RQ1: How is computer expertise related to consumers’ online behavior? 
P P 


2 Methods 


We conducted 22 in-depth interviews for this study. Participants were recruited via email through existing 
acquaintances. Before the interview, we asked participants about their educational level, work background, 
college major, level of computer familiarity, and computer usage in their daily life. These preliminary 
questions were designed to gain some insight about their computer expertise. We applied criteria that large 
technology companies employ for usability tests to categorize participants into three levels of computer 
expertise: experts, semi-experts, and novices. Experts were those with more than 10 years of active computer 
and Internet usage history, who worked in a job that involved computer system administration, and have 
(or be pursuing) a post-secondary degree in computer science or information technology. Semi-experts were 
defined as users familiar with computer and Internet technology between four to ten years but did not have 
any experience with computer systems nor a computer science or technology-related degree. Novice users 
in our study were defined as users who have been using computers and the Internet between one to four 


years. Table 1 outlines participant demographics and computer expertise. 


Sex Age Computer expertise 
Pl M 23 Semi Expert 
P2 F 29 Semi Expert 
P3 F 30 Semi Expert 
P4 M 23 Expert 
P5 F 22 Expert 
P6 F 22 Novice 
P7 M 28 Semi Expert 
P8 F 20 Novice 
P9 M 23 Semi Expert 
P10 F 24 Semi Expert 
P11 F 23 Novice 
P12 F 24 Novice 
P13 M 24 Expert 
P14 M 23 Novice 
P15 M 23 Expert 
P16 F 26 Semi Expert 
P17 M 27 Novice 
P18 F 23 Novice 
P19 F 24 Expert 
P20 M 27 Expert 
P21 F 24 Novice 
P22 M 25 Expert 


Table 1: Participant Demographics 


Interviews lasted from 45 to 60 minutes. They were audio recorded and transcribed by a research assistant 
and double-checked for accuracy. Our participants ranged from 20 to 30 (M= 24.38, S.D.= 2.51) in age; 
36% were male. Eighteen participants (80%) were Caucasian, three participants were Asian, and one person 
was African American. 
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2.1 Coding 


Three researchers, including the authors, participated in the coding. The coding was an iterative process; 
first, an inductive approach was taken where major “themes” identified in the protocol were listed into a 
spreadsheet. We then took a deductive approach and identified more specific themes and patterns that 
emerged through the interviews and created a data matrix with participants’ quotes entered into the cells. 
This enabled us to see how representative the themes were across the different users. However, we were not 
just interested in finding commonalities, but also in unique cases. Participants were also coded by computer 
expertise, as outlined above and in Table 1. 


3 Results 


Consistent with prior studies (e.g., McDonald & Cranor, 2009), most participants generally agreed that 
behavioral targeting had pros and cons. (The one exception was P5, who majored in advertising and was 
extremely favorable towards all types of advertising.) The reason participants liked targeted ads were mostly 
the same: usefulness and personal relevance: 


“T am part of a reward program. I made one purchase on my reward card and I start to get emails 
from that company months after I made the purchase. So it’s been intrusive. But I enjoy having 
the rewards and I think it’s a trade-off.” (P16, semi expert) 


3.1 Differences between Experts and Novices 


We saw distinct differences between experts and non-experts in how they processed information and their 
subsequent attitudes. In general, strong negative sentiments about behavioral targeting expressed by novices 
and semi-experts were emotional, based on fear. According to P6, a novice, “It’s very useful but kind of 
scary because your computer can know more about you than you even know, like tracking where you are 
going.” Semi-expert users were also more concerned regarding the vulnerabilities of their privacy online: 


“T have viewed the Internet as a hole, where there is always someone who is watching you. I think 
the moment you get an IP address, you pretty much showcase aloud to the world when it comes 
to your personal private information.”(P16, semi expert) 


“T think it’s highly unnecessary [to collect personal information] when you are accessing the 
advertisement and sometimes even buying from these companies... to me that seems illegal. It’s 
affecting your constitutional rights.” (P9, semi expert) 


Experts, on the other hand, were more annoyed than threatened at the limitations of the technology. “I 
think most of the behavioral targeting including Google Ads is a failure because it does not match up the 
content and just scans key words,” said P13 (expert). Experts were also less concerned about data retention 
due to their understanding of how information is stored. “Information is usually deleted as space requires 
or in the normal course of business,” P4 said. 


3.1.1 Peripheral Cues 


Novices and semi-experts had a tendency to strongly rely on peripheral cues to make judgments about 
credibility or trustworthiness of the site. Source credibility is one of the peripheral cues that Petty and 
Cacciopo (1984) outline in their model and has been found in examples such as celebrity endorsements 
(Petty et al., 1983), symbols in recommender systems (Resnick and Varian, 1997) and corporate credibility 
(for overview, see Wathen & Burkell, 2001). The brand name of the website operator was certainly a 
criterion that many novice participants noted as being important in making judgments about the security 
of the site: 
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“I am kind of careful every time I use a credit card and that’s it. I use websites I can trust like 
Amazon, Publisher... I believe they have very good security. There are hackers or people who want 
to attack the website so they should have good security.” (P7, semi expert) 


Novices were also looking at privacy “seals” to gauge the security of the websites. A few mentioned that 
they even went to the website offering the seals to verify the site. One woman said, “I buy stuff on 
Amazon.com, JC Penny.com. I trust the seal.” (P12, novice) Another novice user (P11) talked about 
Captcha tests as an indicator of a secure site. Although she was not familiar with the term “Captcha” she 
described the feature, saying that it was “nice to have something to click and type in to test if you are a 
real person or not.” Participants also thought that websites that looked more reliable or professional in 
terms of design were more credible. 

However, those with high levels of computer expertise were not very convinced by seals or other 
peripheral cues. P13 talked about how seals do not add anything to his trust of the websites because 
“businesses know how to manipulate things.” Another expert user (P15) pointed out that seals were just 
an indicator of the website operator investing a little more money. “If you pay fifty bucks, you can get it 
on your website,” he said. 

Sponsored links gave mixed signals in terms of credibility. Many semi-experts and novices were not 
able to distinguish sponsored links from unsponsored links, but among the people that did, novices perceived 
sponsored links as being more secure sites. The following quote illustrates a participant’s misunderstanding 
of what a sponsored link is: 


“T am inclined more towards sponsored ads as they are more reputable in terms of security and 
privacy. They usually are a stronger company and are usually safe from viruses, Trojans, and 
malware. Unpaid ads have a chance for more viruses.” (P1, semi expert) 


Experts, on the other hand, used different peripheral cues such as the “https” in the website address because 
they perceived this as a cue for a secure site. They also placed less importance in brands and privacy seals 
in comparison to novices. 

Although we did not ask participants about what kind of computer they used, several participants 
identified themselves as Apple computer users and displayed confidence in protection from spyware. 
Sometimes, this led to a false perception of security among novices. Semi expert and expert users were also 
more familiar with anti-virus applications. They used different types of anti-virus software, were partitioning 
their computer to prevent viruses, and using other types of security software: 


“T run two programs simultaneously. One is Active Monitor and one is an anti-virus. Active Monitor 
scans the computer basically and clears up the stuff.” (P3, semi expert) 


3.1.2 Privacy Policies 


Experts reported that they read privacy policies—not very carefully, but usually when they were using a 
website for the first time, they scanned through the policy for certain indicators. : 


“They are so long and I don’t have thirty-five minutes to read them. So I scan them, and I want 
to see if anything scary pops up, like jail. If they say something like they will charge you for using 
the website, I won’t go on it.” (P5, expert) 


Novices and semi-experts, however, rarely read privacy policies for three main reasons. The first was that 
users did not read privacy policies for websites that they trusted, such as Amazon, Google or Best Buy. 
The second reason participants avoided reading privacy policies was because of the lengthiness of the 
message itself, which provided a distraction from what they wanted to do. This was particularly prevalent 
for online shopping sites. As one participant put it, it didn’t matter what the privacy policy said because 
she wanted to buy the product. The third was a perceived social norm; participants were assuming that no 
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one else would read the privacy policy. “I think no one is gonna read that really. People don’t spent twenty 
minutes to read the stuff,” P6 said. 


3.2 The “Uncanny Valley” Effect 


An unexpected finding emerged through the interviews as participants talked about a “weird” negative 
feeling that they sensed when the targeted ad was getting too personal. Their comments strongly resounded 
with the phenomenon that takes place in the “uncanny valley” effect (Mori, 1970). This term, initially 
conceptualized to describe how humans feel about robots, claims that as a machine acquires greater 
similarity to humans, it becomes more appealing. However, when it becomes too close to the likeness of a 
human, people experience a strong discomfort; when a machine looks “perfectly human” the positive 
emotions are revived. Other than robots, the idea of the uncanny valley has been applied to other computer- 
generated entities such as animated characters in video games and movies (Seyama & Nagayama, 2007). 

A similar phenomenon was seen in behavioral targeting whereas if the targeted ad was too obvious 
about tracking the participant’s behavior, they felt uncomfortable: 


“If you are too aggressive in collecting information, people start seeing ads related to their 
geographical location, IP address, or stuff like that, it really bothers some people. They will get 
mad.” (P13, expert) 


“Tf I purchased something online yesterday or very recently I will be disappointed [by the targeted 
ad] but if I buy something long ago and then get the ad I will be okay because if I buy it a long 
time ago there is a random chance that they may not have used my browsing behavior.” (P2, semi 
expert) 


Ads that were relevant to the participant’s interests and well-targeted, but lacking in usefulness due to ill 
timing, were perceived as being worse than irrelevant ads. For example, several participants complained 
about receiving ads for products they had just bought: 


“Say I have bought ten dozen boxes of Coca-Cola. Then an advertisement of Coca-Cola will be 
shown to me again and again, even if I don’t need to buy it.” (P3, semi expert) 


However, if the advertisement was taking place in a specific purchasing environment where the system was 
making recommendations that perfectly aligned with the user’s interests, the targeting was no longer 
perceived as being “weird.” For example, P14, a novice, described how his prior searching behavior on an 
online bookstore led to advertisements of other related books. He compared this to the music-streaming 
website Pandora, which plays songs that are similar to the ones you search. 


4 Conclusion 


This paper provides a cognitive explanation of differing young adults’ privacy attitudes and behaviors. We 
found that how users process information in relation to privacy and security issues online was very different 
based on the level of their computer expertise. Low-expertise users relied on peripheral cues in order to 
make judgments about websites’ information-collecting activities. 

Even among our sample, we found a considerable privacy knowledge gap between expert, semi 
expert and novice users. Given that our participants are in the age bracket of 20 to 30, the findings from 
this study may be considered as a valuable step in understanding the importance of how users’ process 
information in regards to their online privacy practices. Such understanding may provide useful guidelines 
in terms of creating policies associated with users’ personal information for business purposes. 

Our interview data also suggests that major brands should work hard to maintain their credibility 
as not to violate expectations of the customers who have low computer expertise, who are more trusting of 
large brands and taking the brand name as a credibility cue. On the other hand, smaller brands may want 
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to incorporate as many peripheral cues as they can, such as sponsored links, security seals, user reviews, 
and professional-looking interfaces, especially if the target customer is anticipated to have low computer 
expertise. 
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1 Introduction 


One of the expressions of globalization is dependence on data from multiple nations and international 
organizations that can be applied to examine and resolve international concerns and demonstrate the impact 
of their activities (Development Impact; OpenData; Zollie, 2013). Shared data, specifically shared 
international data, has been instrumental for global topics such as climate research, health, migration, and 
economics. 

While international organizations have long been committed to activities for the betterment of 
global issues, it is only recently that they have become aware of the importance of sharing the data they 
produce with people outside their agencies. 

Not all organizations and governments make their data available to the general public, whether due 
to policy, lack of awareness, or shortage of funds. Older data that predates the recent demand brought on 
by the popularity of the data age can be particularly difficult to come by. Researchers in need of data that 
is not available in existing datasets can mine documents, often available in formats such as Microsoft Word 
or PDF files, to extract the information they need and turn it into a dataset, a practice known as data 
scraping. Data scraping is an important contribution to the ability to share data. 

This paper describes a partnership between a researcher and a programmer working together to 
uncover data about the United Nations peacekeeping operations (PKO). The goals of the project are 
twofold: first, to uncover the raw numbers available about UN peacekeeping missions, in order to reveal 
information about the number of troops allocated by different countries to the various missions, as well as 
numbers about the funding of these missions; second, to describe the process of scraping data and place the 
practice in the broader context of social science research in the field of open data from intergovernmental 


organizations. 


2 The United Nations peacekeeping missions 


Founded in 1945, the United Nations (UN) is an international organization that was established following 
the disbanding of the League of Nations and the end of World War II. The UN is governed by a charter 
and it is through this charter that peacekeeping is supported, specifically through Chapter VI, Pacific 
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Settlement of Disputes, and Chapter VI, Action with Respect to Threats to the Peace, Breaches of the 
Peace, and Acts of Aggression (United Nations, 1945). 

Yet in spite of its favored standing in the role of peacekeeper, UN PKO activities have also been 
the subject of criticism and public scrutiny. Some critics see PKO, particularly long-term operations such 
as the missions to the Middle East, India-Pakistan, and Cyprus, as perpetuating existing conditions by 
lifting responsibility from conflicting countries (The Panel..., 2000). 


3 UK PKO: Available data 


The United Nations does not have an open data policy, and in this is not unique among governments and 
NGOs. Since data gathering and dissemination is not core to the responsibilities of such organizations, 
transforming information into data is often the responsibility of the user (Whitmore, 2012). There is a 
plentitude of information available to the public through their main statistical website (UN data), and in 
some cases these statistics are available for download as raw data, but availability varies. 

The main UN peacekeeping portal (United Nations peacekeeping) provides statistics and basic facts, 
but does not offer raw data about the number of troops, number of troops per mission, funds allocated by 
country, funds allocated per mission, or funds allocated per country per mission. Such data is instrumental 
to researchers trying to answer questions regarding specific missions, or broader questions such as ways to 
achieve cost reduction or examine the benefits of interventions, relations of troops or expenditures to 
casualties, participation parity, or cost benefit analysis of PKO. 


4 Data scraping 


Data scraping began gaining popular usage in the 2000s, aided by the increase of data sources available on 
the web and the momentum built by open-source and open-access public information advocacy. Data sources 
are shared through data management systems (DMS), also referred to as data hubs, that allow gathering, 
sharing, and using data. In order for users to work with data for their intended purposes, a DMS should 
ideally allow the following capabilities: (1) load and update data from any source; (2) store datasets and 
index them for querying; (3) view, analyze, and update data in a tabular interface such as a spreadsheet; 
(4) visualize data, for example with charts or maps; (5) analyze data, for example with statistics and 
machine learning; (6) organize many people to enter or correct data (crowd-sourcing); (7) measure and 
ensure the quality of data and its provenance; (8) define data permission as open, private, or shared; (9) 
find datasets and organize them to help others find them; and (10) sell data, sharing processing costs 
between users (Irving & Pollock, 2012). 

We identify these four uses/phases for scraped data as follows: The first use is in journalistic 
statements, the second use is to produce tables that cull data from various sources, the third use is for 
software, and the fourth use, finally, is in analysis. While these areas are not always distinct, the following 
examples can help explain each category of use. 

Journalistic statements: A journalist may hypothesize that some nations use UN peacekeeping 
missions as training ground to prepare their soldiers for war, but would need to first uncover supporting 
evidence. Scraping data is often the only way to uncover such data from public sources, as will be 
demonstrated in the paper. 

Tables: Scraped data can be used to gather data, store it on a spreadsheet, and create statistics 


from it. As suggested in an example by Paul Bradshaw (2010): 


“vou might use a screen scraper to gather information from a local police authority website, and 
store it in a lovely spreadsheet that you can then sort through, average, total up, filter and so on 
— when the alternative may have been to print off 80 PDFs and get out the highlighter pens, Post- 
Its and back-of-a-fag-packet calculations.” 
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Software: Part of the scraping process involves turning data in one format (e.g. text in a PDF file) into 
another format that is machine-readable, such as Python, and that allows users to run a computer program 
on it. The ScraperWiki provides a number of tutorials for programming scraped data (ScraperWiki 
Documentation). 

Analysis: In a recent news story about immigration policy for the rich, researcher Dune Lawrence 
wanted to examine the success of an immigration policy that allows immigrants to buy U.S. citizenship. 
The researcher found that the INS does not make this data available, and had to cull data from many 
indirect sources to provide an analysis of the phenomena. Lawrence ended up using indirect data from the 
INS about the EB-5 program (an immigration program that allows wealthy investors to get a green card in 
exchange for funding American businesses), such as lawsuits filed or number of applications granted. 
Lawrence found that no evidence is collected that will be able to provide a picture of the true economic 
value of the EB-5 program (Investing in citizenship, 2013). 


5 The scraping process 


The scraping process typically consists of four steps: Step I, scraping; Step II, analysis; Step III, presentation; 
and Step IV, publishing and promoting. These help us progress from the initial idea that promoted the 
scraping, through recognizing its potential, all the way to pulling in and transforming the original raw data 
and determining a way to analyze it. For programmers it is akin to prospecting for diamonds, finding them 
in the earth, cutting a mine tunnel with bare hands, separating the diamond from the rock, roughing out 
its edges, and gluing it onto a steel washer for a ring to show the world. 

Once the data sources have been identified, Step I in the process is the scraping itself, i.e., producing 
usable data. As indicated above, numeric information about UN Peacekeeping missions—particularly 
information regarding the funding of the missions — is not readily available from the UN, and what little 
is available is in the form of non-searchable PDF files. 

A typical PDF can be viewed in Figure 1: 


UN Mission's Summary 
detailed by Country 
Month of Report : 31-Jan-12 
Country UN Mission Description M F Totals 
Argentina 
MINURSO Experts on Mission 3 0 3 
MINURSO [7 
MINUSTAH Individual Police 19 1 20 
Contingent Troop 688 33 721 
MINUSTAH 741 
UNFICYP Contingent Troop 251 14 265 
UNFICYP [ 25 
UNMIL Individual Police 13 0 13 
UNMIL | 
UNMISS Individual Police 1 0 1 
UNMISS L 7 
UNOCI Individual Police 2 1 3 
UNOCI | 
UNTSO Experts on Mission 3 o] 3 
UNTSO [az 
Argentina 1,029 [| 
Australia 
Experts on Mission 2 


Figure 1: UN Missions summary, detailed by country, Source: ScraperWiki Data Blog 
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The table in Figure 1 shows that during the month of January 2012, the government of Argentina sent 
three of its citizens to the United Nations Mission for the Referendum in Western Sahara, 741 people to 
the United Nations Stabilization Mission in Haiti, 265 to the United Nations Peacekeeping Force in Cyprus, 
and so forth. For coding purposes, this format is relatively easy to work with, since it is consistent and the 
coder was able to complete it with about 160 lines of code. After getting the code working, we cleaned it 
up by removing all the leftover print statements until all that would be produced at runtime was a message 
appearing when a new month became available in the database. The email-generating code is on line 34, 
and it regularly sends an email as captured in Figure 2: 


Subject: UN peacekeeping statistics for 2012-01 


Dear friend, 
There are 788 new records in the database for 


https://scraperwiki.com/scrapers/un_peacekeeping_statistics/ 
after month 2011-12 to month 2012-01 


Figure 2: Sample e-mail alert for new document, Source: ScraperWiki Data Blog 


The preceding step resulted in a table of over 86,000 records stretching back to January 2003, and now 
gives way to the second step: analysis. 
The important columns in the table are shown in Figure 3: 


month text, 
country text, 
mission text, 
people integer 


Figure 3: Source: Scraper Wiki Data Blog 


This data, with the addition of SQL, allows the creation of hundreds of relevant timeline graphs. For 
example, one can answer questions such as: What are the three top countries in terms of maximum 
deployment to any mission, or: To which missions do these three neighboring, sometimes-at-war, rival 
countries predominantly send their troops? (detailed step-by-step examples for analyzing the data is 
available from Todd, 2012). 

The data can be parsed down further to allow us to look at just the deployment of peacekeepers 
from India, Bangladesh, and Pakistan to MONUC (United Nations Organization Mission in the Democratic 
Republic of the Congo) over time, illustrated by Figure 4. As you can see, the pattern of deployment tends 
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to remain at a constant quota over many years, with sudden jumps, probably due to requirements on the 
ground. Pakistan appeared to supply both of these peacekeeping surges, once in 2003 and once in 2005, 
while Bangladesh surged at one and India surged at the other. 


P 
Zoom:id 5d im 3m Sm iy Max * people_indis 4.55 k © people_bangladesh 2.82 k * people_pakistan 3.65 k | June 15, 2010 


Figure 4: Source: ScraperWiki Data Blog 


Following Step I, scraping, and Step II, analysis, we enter Step III, presentation. 

Data scrapers are typically trained as programmers and feel inept when it comes to presentation. 
It is not uncommon that projects get abandoned at this stage, since programmers are intimated by the 
challenges posed by design. While this is time-consuming, the job is not done until the datasets can be 
presented in a format comprehensible to non-programmers. 

In the case of the UN peacekeeping mission, we used an interface for generating graphs of the 
queries that people might be interested in. It took two hard hacking sessions to get it into this form — 
twice as long as it took to write the original scraper — and the result can be seen in Figure 5: 


Zoom: ld Sd im 2m &m 1y'Maxjentins 741 © Bangladesh 483 + Brazil 2.19 k © Chile 516 © india 458 © Jordan 776 © Nepel 558 * Sri Lanka $85 | January 15, 2012 


4 2005 2006 2007 2008 200 2010 2018 2c 
1 

| Make timeline | Number of people per selected country to MINUSTAH 

Contributor nations: Clear Peace-keeping missions: | Clear | Top contributions: | Refresh | 


Albania + UNMIS + Brazil -> MINUSTAH (2308) + ExEcyoamuntenti“Aegntins’, 
Algeria E] UNMIL J Jordan -> MINUSTAH (1522) |2] semipeopie*(country~Bangiadesk)) a 

e = 7 7 `, wam(people* 
Argentina UNAMID Uruguay -> MINUSTAH (1172) Basaiate-menirwonie® J 
Austraia UNOCI Nepal -> MINUSTAH (1132) (countrya Chila) an Chie’, mipeaplo® 
Austria MINUSTAH Sri Lanka -> MINUSTAH (962) _(oustey=Tedia) z India’, sam(poopla* 
Bangladesh MONUC Argentina -> MINUSTAH (722) A an “Nega’ 
Belarus — pannie “Sei 
Belgium UNMIK India -> MINUSTAH (523) ee re 
Benin MONUSCO Canada -> MINUSTAH (491) ORDER BY month 
Bolivia ~ MINURCAT - Bangladesh-> MINUSTAH ~ 


Figure 5: Source: ScraperWiki Data Blog 


When the page initializes, there are three Ajax callbacks to the database to obtain the lists of countries, 
missions, and top contributions from countries to specific missions. Users can make multiple selections from 
the countries and the missions lists to create timeline graphs of numbers of people involved. If one selects 
only from the countries list, it shows the troop contributions from those countries to all UN missions. Users 
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can additionally select a single mission as well, and it will graph those country contributions to that specific 
mission. And it works in the other direction, for lists of missions by countries. The top contributors table 
helps identify the top countries (or missions), so you know which ones to select to make an interesting 
graph with content (e.g. there is no point in graphing the number of Italians deployed to Nepal, because 
there aren’t any). 

Where do the Italians go? You can find that out by selecting “Italy” from the “Contributor nations” 
column and clicking the “Refresh” button on the “Top contributions” column. And you can also click on 
“Make timeline” to discover that Italy never sent anyone anywhere until late 2006, when they suddenly 
started deploying two to three thousand peacekeepers to Lebanon. What happened then? Did something 
change in Italian politics around that point? That would be the subject of another project. 

The fourth and final step in the process is publishing and promoting the data, by far the least 
straightforward part of the process. Typically the way in which the data is made public reflects the goals 
for which it was collected (see Section II). Journalists, academics, market researchers, lobbyists, and other 
potential interested parties will use the data to support their goals. 


6 The case of the Democratic Republic of Congo 


To further illustrate the process and utility of the scraping process, we focused on the UN peacekeeping 
mission in the Democratic Republic of Congo (DRC). 

The UN established a peacekeeping mission in the Congo in 1999. The mission operated until 2010 
under the name MONUC (Mission de l'Organisation des Nations Unies en République démocratique du 
Congo) and in 1999 the mission’s name was changed to MONUSCO. The mission to Congo was authorized 
to operate until June 30, 2013. There are currently 20,000 uniformed personnel serving in the Congo with 
an approved budget of US$1.4 billion for the 2012-2013 fiscal year (MONUSCO facts and figures). The 
number of troops in Congo increased by 440% from the original 3,700 of troops deployed when the mission 
was first established (Bellamy & Williams, 2011, 122). 

To dig deeper down into a specific PKO, such as the MONUC, we will look more closely at 
deployment of peacekeepers from India, Bangladesh, and Pakistan to MONUC over time. Interest in these 
particular country deployments to Congo were done to test a journalistic hypothesis that these peacekeeping 
missions are a clever way for nations to get their troops battle-hardened before the inevitable conflict on 
their own territory. In other words, they also serve as war-training missions. 

We created a SQL query (Figure 6): 


General graph of things 


Useful for quickly generating graphs from sql queries. To do: timeline graph, layout (eg float right), multiple 
fields in output, create direct csv/html download links from api 
Scraper name: un_peacekeeping_statistics 
API key: 
Attach: 
SELECT month[]|'-15', 
sum(people*(country='India')) as people india, 
sum(people*(country='Bangladesh')) as 
people_bangladesh, 
sum(people*(country='Pakistan')) as people_pakistan 
SQL: FROM swdata 
WHERE mission='MONUC 
GROUP BY month 
ORDER BY month 
jsondict Z| https:/ /api.scraperwiki.com download apidocs 
Table ) (Image ) (Graph ) (Timeline ) (Link ) (Schema ) 


Figure 6: Generating graphs from SQL queries, Source: Source: ScraperWiki Data Blog 
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This query creates a dynamic JavaScript timeline, presented as a bitmap image (Figure 7). As the timeline 
shows, the pattern of deployment tends to remain at a constant quota over many years, with sudden jumps, 
probably due to requirements on the ground. Pakistan appeared to supply both of these peacekeeping 
surges, once in 2003 and once in 2005, while Bangladesh surged at one and India surged at the other. 
Similarly, this process can be repeated for Bangladesh and Pakistan. 


Zoom:id 5d im 3m 6m iy Max + people_india 4.55 k © people_bangladesh 2.82 k * people pakistan 3.65 k | June 15,2010 | 


Figure 7: Deployment of peacekeepers from India, Bangladesh and Pakistan to MONUC over time, Source: 
ScraperWiki Data Blog 


One the data is available, it is rendered usable in the presentation stage. This is often the most challenging 
and time-consuming part. In this particular case, creating the visual presentation (Figure 8) took twice as 
long as scraping the data. The result is an interactive table that allows users to select PKO and counties 
the check changes in deployment over time. 


UN peace-keeping stats 


Sourced from UN site by application of un peacekeeping statistics scraper. 


You can graph the peacekeeping contributions from different countries and to different missions, or graph the contribution from several ¢ 
For more information, see the blog. 


Zoom: id 5d im 3m 6m ly Max e total_people 91.22 k | June 15, 2013 
100 k 
i Brne 
——— — 80k 
_———— 60k 
ioe 
40k 
= a 
20k 
0 
1 L 1 1 1 1 1 1 1 L 1 1 1 1 1 1 1 1 1 1 1 L 1 i 1 1 1 1 1 1 1 i) 1 1 1 
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2 


2007_______-2008-_-——-2009--— 2010 


Contributor nations:( clear ) Peace-keeping missions:( ciear ) Top contributions:( Refresh ) 


Albania UNMIS India -> MONUC (4392) SELECT month, sum(people) as 
Algeria © UnmiL India -> MONUSCO (4248) Ù mai onsen BY monn oo BY 
Argentina UNAMID Bangladesh -> UNMIL 

Armenia UNOCI (3973) 

Australia MINUSTAH Pakistan -> UNAMSIL 

Austria UNMISS (3877) 

Bangladesh MONUC Ethiopia -> UNISFA (3818) 

Belarus UNMIK Pakistan -> MONUC (3771) 

Belgium MONUSCO Pakistan -> MONUSCO 


Figure 8: UNPKO statistics, Source: UN Peacekeeping Statistics 
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7 ~~ Conclusion 


In this paper we described the uses and process of extracting data from United Nations documents. As 
awareness and interest in data increases we can expect to see more datasets available online in downloadable 
formats, although we can anticipate a backlog in making older data available. Data scraping allows users 
to convert statistics available in PDF or Word documents into useable data. This data can be used to 
answer questions that are quantitative in nature, even when the data creator (in our case the United 
Nations) does not made its data publicly available. 
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Abstract 

Digital textbooks are becoming increasingly popular in schools throughout the world. Simultaneous 
mandates to adopt digital textbooks in the country of South Korea and the U.S. state of Florida provide 
an opportunity to study how culture and context might impact this implementation from the perspective 
of the school librarian who serves multiple roles when new technology is introduced. In this study the 
Concerns-Based-Adoption-Model (CBAM) was used to identify the concerns of school librarians in 
Florida about their potential role in the implementation of digital textbooks and how personal levels of 
adoption relate to these concerns. Results indicate that innovators and early adopters have higher levels 
of concern, which are more substantive and practical, while late majorities and laggards have more vague 
uneasiness and lurking anxiety. Regardless of the speed they adopt innovations, school librarians have 
similar levels of personal concern. 
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1 Introduction 


Digital textbooks are becoming increasingly popular in schools throughout the world. This trend 
accompanies a commitment of schools to facilitate 21st century learning and an intensified presence of the 
Internet and digitally rich materials in classrooms. There are also perceived advantages of digital textbooks 
such as increased comprehension, differentiation, cost-savings, health and safety, and protection of the 
environment (Mardis, et.al, 2010). Simultaneous mandates to adopt digital textbooks in the country of 
South Korea and the U.S. state of Florida provide an opportunity to study how culture and context might 
impact this implementation from the perspective of the school librarian who serves multiple roles when new 
technology is introduced. 


1.1 Digital textbooks in Florida and South Korea 


In the United States, Secretary of Education Arne Duncan has called for a transition from printed textbooks 
to digital ones as quickly as possible, noting that textbooks should be obsolete in a few years and that 
digital textbooks will enable the U.S. to keep up with other countries “whose students are leaving their 
American counterparts in the dust.” (ALA, 2013, p. 32). Florida is one state that is heeding this imperative. 
In June 2011, Florida’s governor signed a bill mandating all public schools in the state to use entirely digital 
textbooks and assessments by 2015. One school district in Florida already serves as a digital textbook leader. 
Since 2010, Clearwater High School in Pinellas County established a 1:1 initiative, placing an e-reader in 
the hands of each of its 2100 students. 

Implementation of digital textbooks began even sooner in South Korea. On March 8, 2007, based 
on ‘The Plan for Commercializing the Digital Textbook,’ the Education Ministry of South Korea announced 
that they would develop and apply digital textbooks for 25 K-12 courses by 2011 (KERIS, 2007). Since 
then, South Korea has played a pivotal role in leading changes of teaching practice by integrating digital 
textbooks into schools (Kim & Jung, 2010). Subsequent to a pilot test of digital textbooks in 132 model 
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schools between 2007 and 2011, the Ministry of Education, Science and Technology declared there would 
be a nation-wide mandated implementation of digital textbooks by 2015 (KERIS, 2011). A plan to use 
digital textbooks at hundreds of elementary and middle schools around the country during the 2014 school 
year was scaled back to better understand results of pilot tests and public opinion and full enforcement is 
to be determined in the first half of 2015. 


1.2 The potential roles of school librarians in digital textbook implementation 


Formal roles have yet to be identified for school librarians in digital textbook implementation in either the 
U.S. or South Korea. However, in a 2010 white paper (Mardis, et. al, 2010) some potential roles have been 
envisaged. The authors note that: 


Digital textbooks represent another opportunity for school librarians to enhance their vital 
leadership in teaching and learning. Librarians, of course, are experts at identifying, collecting, and 
organizing the best content, free or for a fee.... In an age when many school librarians are not sure 
about the continued relevance of their promotion of reading and love of books, ebooks and digital 
textbooks may represent a fresh way to continue advocacy for the importance of reading as well as 
for the school librarian's crucial leadership role in technology integration (p. 14). 


Similarly, in South Korea, school librarians’ roles with digital textbooks are not agreed upon, although the 
Library Act (Ministry of Culture, Sports and Tourism, 2007) codifies traditional school librarians’ roles 
very clearly: 


e Collection, organization, preservation, and provision of services of materials necessary for school 
education; 

e Combined administration and provision for use of the educational materials kept by a school; 

e Development, manufacture, and provision for use of audio-visual materials and multimedia 
materials; 

e Construction of the information sharing system utilizing information management system and 
communication network and provision for use of such system; 

e Education of information utilization through the instruction of library use, education on reading, 
cooperative teaching, etc.; and 

e Other duties necessary for the execution of functions as a school library. 


Moreover, because most school librarians currently are in charge of providing print textbooks, they may 
potentially continue this duty with digital textbooks in their schools, which could be also the case in Florida. 
In both countries, the school librarians’ role in implementation of digital textbooks has other leadership 
implications (Everhart, et.al, 2011). 

Given the potential for the school-wide impact, coupled with either a state or national mandate for 
this new technology, it was reasoned that school librarians would be concerned about the implementation 
of digital textbooks. This preliminary study sought to answer the following research question: What are 
school librarians concerns about the implementation of digital textbooks in Florida? Further study will seek 
to determine concerns of school librarians in South Korea and compare the concerns to the culture and 


context in which they occur. 


1.3 Methodology 

In order to ascertain Florida school librarians’ early stage of concerns about digital textbooks, a study was 
conducted using the Concerns-Based Adoption Model (CBAM), which is widely used as both theory and 
methodology to identify an individual’s concern and level of use when implementing innovations and new 
technologies. Concern is defined as “the composite representation of the feelings, preoccupation, thought, 
and consideration given to a particular issue or task” (Hall & Hord, 1987, p. 58). A concern is a psychological 
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action based on personal make-up, knowledge, and experience when a person faces new phases or 
environments and needs improvement or changes (Hall & Hord, 1987). 

Hall and Hord (1987) note that this theory rests on several assumptions: (a) change is a process, 
not an event; (b) an individual accomplishes change; (c) the change process is an extremely personal 
experience; (d) individuals go through various stages of change; (e) the availability of a client-centered 
prescriptive model can develop the individual’s capability with staff development; and (f) these changes 
need to be monitored in an adaptive and systematic way. Along with these assumptions, CBAM includes 
three main concepts: 1) Stages of Concern (SoC), which is the main idea employed by this study; 2) levels 
of use (LoU); and 3) innovation configurations (IC) (Anderson, 1997). 

This study particularly employs SoC, which is one of the methods used to assess the stages of 
concern interpreted by the Stages of Concern Questionnaire (SoCQ). It includes seven categories, which are 
not exclusive of one another, although these seven stages have distinctive characteristics (Table 1). The 
model assumes that when individuals encounter something new, they are interested in all stages, but are 
still more involved in a particular stage. The SoCQ also enables researchers to determine a person’s level of 
adoption of an innovation. 


Stage Concern 
0 Awareness Little concern or involvement with the initiation 
1 Informational Gains more information about the innovation 
2 Personal Uncertainty about the personal desideratum toward the new program 
3 Management Managing skills such as scheduling and integrating 
4 Consequence Attempts at innovation on students 
5 Collaboration Shares interests with others in the new program 


. Focuses on pursuing more benefits of the innovation or exploring an alternative 
6 Refocusing 
program 


Table 1: Stages of Concern 


The survey was conducted during the month of October 2012 using the online version of the SoCQ, which 
is the encrypted survey. Researchers recruited 170 participants applying email promotion among Florida 
Association for Media in Education (FAME)’s members. A two-part instrument, the Stages of Concern 
Questionnaire (SoCQ) (35 items) and a demographic survey (16 items) were applied. The first section dealt 
with SoCQ in which the term of ‘innovation’ was modified to ‘digital textbooks.’ The percentiles of the 
different levels of concerns were calculated by matching the average score of each stage to the established 


percentiles. 


2 Conclusion 


Overall, this sample of Florida school librarians profile for concerns about digital textbooks reflects patterns 
of the typical nonuser in the SoCQ profile (Hall, George, & Rutherford, 1977). Profiles can be seen in Figure 
1. The nonuser concern profile is the most common having the highest values on Stages 0, 1 and 2 with 
relatively lower values on Stages 4, 5 and 6. Furthermore, the nonuser profile demonstrates that school 
librarians’ concerns about digital textbooks are very much in the initial phase. If the innovation is positive 
and there is proper support for implementation, the plotted concern profile wave will progress from left to 
right over time since the SoCQ hypothesizes that the individuals develop their concerns progressively over 
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the course of stages (Hall & Hord, 1987). This profile is reflecting the current status, given the fact that 
the State of Florida has only mandated the use of digital textbooks and has gone no further. 

Another notable finding in this profile is that one can reasonably assume that it reflects potential 
resistance to the implementation of digital textbooks. Hall, George, & Rutherford (1977) considered Stage 
2 to be more intense than Stage 1, and the tailing up of Stage 6 as a negative nonuser concerns profile, 
which has potential resistance. This digital textbook profile (Figure 1) pertains to the first reason for the 
negative disposition. The percentile of Stage 2 (78%) is slightly higher than that of Stage 1 (75%) indicating 
that school librarians are more concerned about digital textbooks in the personal concerns stage. The 
participants expressed more concern over personal impact and responsibility with regard to digital textbooks 
than over gathering the substantive information about digital textbooks. Of particular interest, the high 
Stage 2 concern profile presents a level of disagreement or doubt regarding the implementation of digital 
textbooks. It is stressed that Stage 2 concerns should be reduced before the individual can appreciate the 
coming innovation (Hall, George, & Rutherford, 1977). 


100% 


91% 
o, 
80% 75% 718% 
60% 
7% 48% 
40% 
: 30% 
4% 
20% 
0% 
0 1 2 3 4 5 6 
Unconcerned Information Personal Management Consequence Collaboration Refocusing 


Figure 1: Stages of Concern Profile for School Librarians’ and Digital Textbooks 


It is interesting to note that school librarians in Florida, who are all in the same situation concerning digital 
textbook implementation, have different stages of concern about it. Although all respondents commonly 
have the highest concerns on Stage 0 and the lowest concerns on Stage 4, overall Figure 2 clearly 
demonstrates the differences among the five categories. School librarians who may adopt digital textbooks 
quickly have higher stages of concern about the innovation. Alternately, as school librarians are conservative 
regarding the innovation, they have lower stages of concern. This seems to indicate that innovators and 
early adopters have higher levels of concern, which are more substantive and practical, while late majorities 
and laggards have more vague uneasiness and lurking anxiety. What is unique is that regardless of the 
speed they adopt the innovation, school librarians have similar levels of personal concern (Stage 2). The 
biggest gap among the different adopter categories is Stage 5: the collaboration stage. The innovators and 
early adopters have many more concerns in Stage 5 than in other categories. As school librarians quickly 
embrace new technology, they worry more about how to share digital textbooks initiatives, pool knowledge, 
prepare new technology, and work together. 

Figure 2 also demonstrates future potential resistance in terms of a negative nonuser concerns 
profile. Although across every adopter category the intensity of concern of Stage 6 is lower than that of 
Stage 5; the intensity of three categories, which are innovators, early adopters, and early majorities on Stage 
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2 is higher than Stage 1. This implies that, as mentioned above, there will be negative reactions about 
digital textbook implementation, and a very innovative group of people will lead these resistances. 
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=@— |nnovator =@=— Early adopter Early majority == Late majority —=@—|aggards 


Figure 2: Stages of Concern by Categories of Adopters 


2.1 Culture and Context 


The concerns school librarians have about the implementation of digital textbooks in Florida and South 
Korea provide an opportunity to study this innovation in a cross-cultural context. Currently, preliminary 
interview data is being gathered from South Korean school librarians leading up to a dissertation study 
comparing CBAM results from both countries. There are cultural similarities and differences that will make 
for interesting analysis. In both countries the governments have issued a mandate but have not provided 
supports either financially or via professional development for schools to transition to digital textbooks. In 
South Korea, there are currently pilot sites being studied where in Florida there are not. In the U.S., school 
librarians are considered equal to teachers in schools whereas in South Korea school librarians are not on 
the same level as teachers but instead are considered support personnel. In both cultures there is a tradition 
of respect for the print book, but it stronger in South Korea. Before the post-liberation education of the 
early 1990s, Korean school culture was rooted in Confucianism, which has a decisive effect on the Korean 
culture. Due to this tradition, Korea has maintained a national curriculum with uniform standards (So, 
Kim, & Lee, 2012) which is only now being implemented in the U.S. visà-vis the Common Core Standards. 
In order to analyze culture and context, which influence individual school librarian’s concerns about digital 
textbook implementations, Boyd (1992)'s conceptual framework of school ecology and school culture will be 
applied. 

The next few years will be important ones for school librarians to identify their roles for digital 
textbook implementation. While preparing for these new types of textbooks, school librarians may have 
various concerns. Along with them, evaluating their perceptions in order for educational policy makers and 
school administrators to identify areas requiring further attention and resources in terms of school libraries 
will also be needed. These concerns can be molded into strategies for successful implementation. CBAM can 
be used as an initial step to study this early phase and also in subsequent years as the digital textbooks are 
actually employed and how concerns may change. In the long run, conducting chronological research to see 
how school librarians’ concerns over digital textbooks have changed will help to identify the patterns of 
how school librarians adopt the innovation and provide more effective ways to serve the changes. 
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Abstract 

This study explores the wider institutional discourse from which stereotypes of librarians emerge. The 
findings shed light on the discursive practices (e.g. the methods, the rituals, and the interactions) that 
take place between students and library staff, and how these inform the student’s perspective and 
experience of the librarian stereotype. This study utilizes the theoretical framework presented by Radford 
and Radford’s study entitled “Libraries, Librarians, and the Discourse of Fear” published in 2001 which 
argued that it is within this discourse that negative stereotypes of the librarian emerge. The students 
strongly expressed a fear of the library environment and atmosphere, and being intimidated by a 
librarian’s knowledge but collectively the students did not report being scared of librarians. Rather, they 
viewed librarians as meek and feeble — which, despite being a negative stereotype, is not the dominant 


one that persists in popular culture. 

Keywords: librarian stereotypes, librarian image, discourse of fear, librarian myth, discourse analysis 

Citation: Kalsi, A. (2014). Pervasive Myth or Pop Culture Relic? College students’ Experience of the Librarian Stereotype. In 
iConference 2014 Proceedings (p. 598-604). doi:10.9776/14256 

Copyright: Copyright is held by the author. 

Contact: kalsia@email.sc.edu 


1 Introduction 


Few professions can match the self-image preoccupation (and subsequent anxiety) that experts within the 
Library and Information Science (LIS) field contend with on an almost-daily basis in everyday situations 
such as being in a restaurant or bar or responding to questions (from friends as well as strangers) about 
their occupation. Those of us who work in the LIS profession are not unfamiliar to encountering comments 
such as “Where’s your bun?”, “You don’t dress like a librarian”, and “I like to read, I should be a librarian 
too!” Admittedly, such comments can be amusing but they also remind us of the pervasiveness of the 
librarian stereotype which continues to demean the LIS profession in attaining standing and recognition 
amongst other professions. Stereotypical image/s of librarians are perpetuated by media and popular culture 
(e.g. movies, novels, children’s stories) and perhaps those within the LIS profession are also culpable for not 
having done enough to combat the stereotype. Cullen (2000) warns: 


“If future politicians, university deans, and other fund managers are brought up on a diet of popular 
movies and TV shows that never realistically portray the services librarians offer, none of them will 
value our skills and expertise enough to keep us in business” (p. 142). 


However, it is not the intention of this study to investigate or indeed, bemoan the representations of 
librarian stereotypes in popular culture. In order to better understand the persistence of librarian stereotypes 
we must turn our attention to the wider cultural discourse from which such stereotypes emerge and flourish. 
These stereotypes are situated within the structures of what Radford & Radford (2000) have termed ‘the 
discourse of fear.’ That is, the institutional practices, speech and symbols that seek to control discourse. 
The Radford’s approach to discourse is directly influenced by Michael Foucault’s notion that all discourse 
is controlled by the imposition of certain discursive practices within institutions. The interactions between 
library staff and patrons provide a variety of examples where the adopted roles reify the image of the 
librarian as knowledge keeper (and therefore holder of power) and patron as a subordinate. For instance, 
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the reference desk librarian as the keeper of order of not only the stacks but noise levels on the floor, 
exemplify such practices from which stereotyped characteristics such as the stern or officious librarian 
emerge. These stereotypes also extend to the perceived work of librarians as largely clerical in nature where 
the utmost obsession of the librarian is orderliness. 

The overarching purpose of this study is to examine how the discourse of fear unfolds within an 
actual library setting through close examination of staff-patron interactions and patron perceptions. This 
study seeks to investigate the persistence of librarian stereotypes amongst college students through 
observing and documenting college students’ experiences of interacting with librarians and non-professional 
library staff. It is only through documenting patrons’ experiences and attempting to understand their 
perceptions of LIS professionals that we can begin to address how the institutional practices might 
contribute to the pervasiveness (and perhaps persuasiveness) of the librarian stereotype. I argue that the 
meaning behind the stereotype is not to be found in popular representations of librarians but within the 
discursive practices and symbols of the profession itself. 


2 Excerpt from Literature Review 


It is perhaps no surprise that virtually all the literature addressing the image of the LIS profession or LIS 
professionals comes from LIS publications. The existing scholarship addressing the librarian stereotypes can 
broadly be categorized into two distinct approaches: (1) studies examining popular representations of 
librarians in media; and (2) studies calling on LIS professionals for the need to adopt more effective 
marketing strategies to counteract the impact of negative stereotypes. There are others but not nearly 
enough to form a distinctive third category. Both the identified categories of scholarly work have been 
useful in this study for helping identify what the prevalent stereotypes are as well as offering suggestions 
for combating the stereotyped images. 


3 Method 


3.1 Theoretical approach 


This study adopted a panoramic understanding of what is meant by discourse and aimed to understand the 
term beyond the confines of spoken and written communication. Libraries are treated very much as 
institutions with their own unique discourse. This discourse is accessible to the researcher by examining the 
discursive practices, methods, rituals, and interactions that take place in the institution of a library. The 
theoretical framework for this research was largely inspired by Radford and Radford’s study entitled 
“Libraries, Librarians, and the Discourse of Fear” published in 2001. The Radford’s central argument is 
that the ‘discourse of fear’ is a “universal and totalizing organizing principal that gives the library its place 
in modern cultural forms;” (2001, p.323) and it is within this discourse from which negative stereotypes of 
the librarian emerge. 

This study seeks to explore the wider institutional discourse from which the librarian stereotypes 
emerge. By conducting observations of two library reference desks I wanted to shed light on these discursive 
practices and capture the students’ perspective and experience of the librarian stereotype. The key research 
questions driving the data collection and analysis were: 


e What are the discursive practices /symbols /language that reinforce the librarian stereotype? 

e How is the stereotype experienced by students if at all? 

e To what extent do the observed activities, staff-student interactions and student experiences reflect 
the discourse of fear? 
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3.2 Data collection 


3.2.1 | Focus Group 


There were 7 participants in the one hour session, plus the moderator. I began the focus group by briefly 
introducing myself and giving a 5 minute overview of the research project. The participants were all USC 
students (all American nationals; 1 grad, 6 undergrads; 2 white males, 4 white females and 1 black female). 
I had advertised free pizza as an incentive for students to participate. The focus group was recorded using 
a laptop PC sound recorder placed in the middle of a table, and later transcribed. 


3.2.2 Field notes from Two Observation sessions 


Two separate observations were conducted at two different locations: The circulation desk of the Business 
Library and the Government Information Reference Desk at the Thomas Cooper Library. Each session 
lasted 2 hours and 15 minutes, and detailed field notes were taken. 


3.3 Data analysis 


One of the fundamental theoretical assumptions of this study was that the librarian stereotypes emerge 
from a ‘discourse of fear’ and so my initial analytical categories were adapted from the Radford’s 2001 
study. However, I broadened and neutralized these constructs, so for example, the ‘humiliation of the user’ 
became ‘interaction’ which was all about documenting the staff and student interactive experience whether 
positive or negative. The librarian as ‘formidable gatekeeper’ construct was broadened to ‘librarian.’ I also 
had a separate category for ‘discursive practices’ that captured the methods and rituals that unfolded. The 
library as ‘other worldly’ and ‘cathedral’ were merged to create the category of ‘surroundings’ which would 
help contextualize comments about the building and atmosphere within the library. 


Category Name Description 


Interaction Transactional; positive or negative experience; 


Librari Experience of the librarian stereotype; physical appearance; personality 
ibrarian 
traits; knowledge (i.e. gatekeeper) 
. i . Daily work practices; the work witnessed during observation and by 
Discursive practices i i ; i 
students; the rituals; the methods of a librarian’s practice 


Surroundings Building features; general environment and atmosphere experienced 


Table 1: Summary of categories adopted for coding data. 


Using the above-categories I studied students’ comments and I then examined the discourse (viewpoints 
expressed, language, the interactions and discursive practices) captured in my data by conceptualizing the 
extent to which the data supported or negated the ‘discourse of fear’ thesis underpinning this study. These 
categories capture the essence of the institutional discourse of a library, and it is within these realms that 
I believe the stereotype of the librarian emerges. 


4 Results and Discussion 


4.1 Interaction 


Radford and Radford assert that “within the discourse of fear, the librarian is also portrayed as a fearsome 
figure, and as one capable of handing out punishment in the form of public humiliation” (2001, 313). 
However, all of the interactions observed between library staff and student were friendly and polite; there 
were no overt acts of public humiliation. During the observation of the Government Information reference 
desk, there were two instances where students were told politely to not put reels back into the draws because 
the library wanted to track usage. Mike (the reference librarian) in a slightly raised voice said “excuse me, 
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you don’t need to put it back, we need to track which ones get most used, so if you just leave it on that 
[Mike points] trolley.” In both instances, the students apologized and looked embarrassed when told, 
presumably because the librarian had brought attention to the offending student in front of others, 
accentuated by the silence in the reading room. Regardless of the politeness of Mike’s instructions, all 
interactions that took place at the reference desk were one-way instructions from the librarian on how to 
do something, be it operating the microfilm readers, locating the microfilm, or searching the databases. 
However, the interaction was always friendly and Mike was always attempting to make the students more 
comfortable by engaging in conversation with them. 

The observations made at the Business library circulation/reference desk reflected the type of 
experiences reported by students in the focus group. Throughout the observation, there was minimal staff- 
student interaction despite the library being busy. Furthermore, the interactions that did take place were 
quite transactional in nature. For example, the overwhelming majority of interactions witnessed involved 
students coming up to the desk to ask a simple query about borrowing stationary or asking for a non- 
reference query. All except one student in the focus group described their interactions with library staff in 
a way that resembled a simple transaction, which also seemed to contribute to why the students’ had a lack 
of understanding about the librarian’s profession. This is exemplified in the following statement from a 


participant in the focus group: 


Gerry: Generally, I mean, honestly, uh, I know it’s a profession — it’s a highly specialized profession 
— but I kind of associate the same way I would, like, uh, a clerk, a store clerk at Publix, you know. 
I try to find something myself, can’t find it, ’ll ask them, um, and I use the exact same tone, exact 


same manner with them as I would with a librarian. 


This response highlights one of the major difficulties experienced by the profession, which is that it is not 
as highly regarded as other professions. The comparison of a librarian to a worker at Publix supermarket 
suggests this student had never experienced any specialized skills demonstrated by a librarian. This 
happened to be the case with most of the students in the group, only one person had knowingly dealt with 
a qualified librarian (the others admitted to be unable to distinguish between library workers and qualified 
librarians). 


4.2 Librarian 


Of the five library workers observed, only two did not conform to some or all of the physical attributes 
associated with the stereotype. In fact, Helen and Mary did not conform to the stereotypical image apart 
from the fact that they were both women (incidentally both were fully qualified librarians). They were 
dressed quite fashionably, in business attire, wore make up, and high heels (as opposed to “sensible shoes” 
worn by the frumpy librarian stereotype), and crucially they didn’t wear glasses. On the other hand Jennifer, 
Monica and Mike did reflect some physical attributes of the stereotype. For example, all three wore glasses, 
were dressed quite plainly and unfashionably. This stereotypical physical image of the librarian was 
something that most of the students said they had experienced with some of the staff at the USC library. 
The following exchange about a library worker illustrates the students’ experience of the stereotype: 


Gerry: He doesn’t blink. 
Gerry: I mean, he’s like a creepy librarian 


Gerry: Yeah, he does hunch a lot, which makes him look shorter, and he wears his pants really high 
up. 
Danielle: He walks kind of funny. He walks slow. 


Carter: He has a strange way of speaking too. 
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Danielle: He has an interesting voice. His eyes are kind of dead. Anna says that he’s “nice” 


Describing the library worker as “creepy” and “odd” fits into the discourse of fear in that the library (and 
librarians) are “strange” or even ghostly. Almost none of the students expressed a ‘fear of the librarian’ or 
spoke of feeling intimidated. The students did not concur with the image of a stern or bossy librarian, 
instead the language they used suggested a view of the librarian as feeble or weak. For example, one 
participant said: 


Emily: I think I don’t feel intimidated. If anything, I think I have this idea that even, like, the men 
there are, like, meek. [Several people laugh] Like, I could beat you in a fight. Give me the book I 
need. [Laughs] I don’t know. 


This quote reflected the consensus in the group, although, one participant admitted feeling intimidated by 
how knowledgeable the librarian was rather than being scared of a librarian. Others also expressed views 
which conformed to the stereotype of librarians always being women, and that the men in the profession 
were effeminate or “not into sports.” Another student’s primary association with the library staff was 
arguing about fines. This fits within the idea of the discourse of fear where the librarian as rule keeper and 
issue of punishment is one of the primary experiences that shape a user’s perception. A widely held 
misconception was illustrated by one of the participants who stated “I always ask the librarian what’s the 
best book to read. It’s a library! [laughs].” Once again confirming the stereotype; associating librarian with 
bookishness and not displaying an appreciation for the varied nature of the profession. 


4.3 Discursive practices 


The majority of students coming to the circulation desk/reference desk did so to use the stapler, ask for 
stationary, or ask about a computer issue. In the only reference query during the observations in the 
Business Library, a student asked the librarian “did you study business because you really know what 
you're talking about?” This example once again highlights the stereotyped association of the profession with 
being primarily administrative or clerical in nature, and not very specialized. However it did not fit into 
the stereotype of a formidable and knowledgeable librarian. 

In my observations of a large number of reference queries at the government information reference 
desk, the ritual that was followed in all of the queries was one where the student comes to the librarian for 
information — it is by default a practice in which the librarian adopts the role as knowledge-keeper and 
disseminator, and the student acts as knowledge-seeker. The adoption of these roles is inherent to the 
structure of a reference inquiry and encapsulates the power relationship as well as potentially intimidation 
that may result. However, most of the students had only experienced transactional interactions with library 
staff and some even said they did not think the librarian was capable of helping them. 


4.4 Surroundings 


According to Radford and Radford, in popular culture “the librarian is represented, not as a person, but as 
an extension of the library itself” (2001, 313). This comment was reflected by one of the participants in the 
focus group who said that librarians “almost seem antiquated. Like, like from another time. Like they’ve 
been living in the library for centuries now.” This perspective fits within the discourse of fear notion of the 
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library as a ‘mysterious’ place. Other students also used words like “tomb,” “catacombs,” “creepy,” “scary,” 
“unnatural,” and “weird,” to describe the surroundings. They also expressed a feeling of being intimidated 
by the building. This again suggests a strong presence of the discourse of fear in the language being used 
to describe the students’ experience of the library. In the focus group, the general consensus, with exception 
of two students, was that the students were uncomfortable with the silence of the library. The essence of 


this is captured in the following exchanges: 
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Anna: And you have no cell phone reception. 
Emily: You know what they call that? A dead zone 
Anna: It’s like a Stephen King novel down there. 


Gerry: No cell phone reception, all these, like shadowy places for like Freddy Krueger or someone 


to come out 
Danielle: It’s quite literally a grave. You go underground, you’re dead. 


Comparisons to a horror movie-like atmosphere confirm the presence of the discourse of fear in many of 


these students’ experience of the library as an eerie and even haunting environment. 


5 Conclusion 


In summary, from the data collected, students strongly expressed a fear of the library environment and 
atmosphere. However, the fear of humiliation notion, despite witnessing a librarian politely telling-off 
students during my observations, was not something the participants in the focus group had experienced. 
One person expressed being intimidated by a librarian’s knowledge but they did not report being scared of 
librarians. Nevertheless, there was a reluctance expressed by the students in wanting to interact with library 
staff or ask for help. I believe this is partially explained by the inherent nature of the student-librarian 
relationship and interaction which can be explained by the discourse of fear thesis. Finally, the students 
had experienced various librarian stereotypes (e.g. a female dominated profession, unfashionable-dress, work 
is mainly clerical) but had also rejected or not experienced the stereotype of librarians being stern and 
intimidating. Rather, they viewed librarians as meek and feeble — which, despite being a negative stereotype, 
is not the dominant one that persists in popular culture. 

This study revealed interesting results about how students experience the librarian stereotype. 
Inevitably the limited data gathered make it harder to contextualize the results into the wider cultural 
narrative but replicating the approach and methods used in this study at different institutions could help 
generate data for enhancing our understanding of, and ultimately combating, the persistent negative 
librarian stereotype. 
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Abstract 

The concept of information access is central to both Library and Information Science and to human 
rights discourse and practice. This paper offers a definition of information access and proposes a relational 
understanding of it. Using a “standard threat analysis,” based on the work of political philosopher Henry 
Shue (1996), the access relation is analyzed in terms of five facets: (1) availability, (2) reachability, (3) 
findability, (4) comprehensibility, and (5) useability. It is shown how this theory can be synthesized with 
another prominent account of access (Burnett, Jaeger, and Thompson, 2008) to create a rubric to guide 
the evaluation and creation of information systems and services that satisfy the human right to 


information access. 
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1 Introduction 


As the discipline that studies “the ways that society stores, retrieves, analyzes, manages and disseminates 
information” (ASIST 2014), the majority LIS research and practice is concerned with improving people’s 
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access to information. To illustrate, in the 2013 iConference Proceedings, the terms “access,” “accessible,” 
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and “accessibility” were used 670 times (more than “computer,” “computing,” and “computable” at 553 
mentions). Concern about “information access” is not limited to LIS researchers and practitioners, however. 
Access to information is a human right guaranteed by Article 19 of the Universal Declaration of Human 
Rights (UDHR). Furthermore, as has been pointed out by a number of authors, access to information is a 
particularly important human right (Bishop, 2012; Britz & Lor, 2010; Byrne, 1999; Calland & Tilley, 2002; 
Jagwanth, 2002; Koren, 1997; Mathiesen, 2012; Raseroka, 2006; Sturges & Gastinger, 2010; Weeramantry, 
1994). Without access to information it is impossible to exercise many if not all of one’s human rights. For 
example, access to information is an essential component of the rights to political participation (UDHR, 
Article 21), a fair trial (Article 10), freedom of conscience (Article 18), and even to health (Article 25(1)). 

In order for LIS researchers and practitioners to improve access to information and for policy makers 
to determine how the right to information access can be fulfilled, they need to understand what “access to 
information” is. However, as has been noted by a number of researchers (Burnett, Jaeger, & Thompson, 
2008; Lievrouw, 2004; McCreadie & Rice, 1999a; McCreadie & Rice, 1999b; Oltmann, 2009), the concept of 
information access is under-theorized in LIS. Leah Lievrouw’s (2004) comment that, “‘access,’ as it relates 
to information and communication technologies, is seldom explicitly defined, even by experts” (p. 269), is 
hardly less true today than when she said it ten years ago. Most certainly compared to other key concepts 
in LIS, such as data, information, and knowledge (Bates, 2005; Capurro & Hjgrland, 2003; Frické, 2009; 
Furner, 2004; Hjørland, 2007; Rowley, 2007), the concept of access has received scant attention. 

This note proposes an account of information access. This account has three components: a 
definition of information access, a characterization of access as a relation, and a delineation of five facets of 
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access. These facets are the constituents of access—those conditions that must be met in order for 
information to be accessible to some person or group of persons. This account was developed using a 
philosophical methodologies of conceptual analysis and the human rights concept of a “standard threat.” 
Finally, it is shown how this account can be combined with another analysis of information access (i.e., 
Burnett, Jaeger, & Thompson, 2008). This synthesis can then be used to guide the evaluation and creation 
of information services and policies to ensure the human right to information access is satisfied. 


2 Related Research 


While there are numerous papers that discuss some aspect of access, to my knowledge there are only two 
fully developed accounts of the concept of access. The first is Maureen McCreadie and Ronald Rice’s (1999a) 
study of the ways in which the term “access” is conceptualized in a variety of disciplines. They found access 
used in six different senses: technology, commodity, control, participation, communication and knowledge. 
The second is a tripartite theory of access proposed by Gary Burnett, Paul Jaeger, and Kim Thompson 
(2008), which, in addition to the generally accepted categories of physical and intellectual accessibility 
(Fidel & Green, 2004, pp. 564-66), added the factor of social accessibility. (Hereafter, this will be referred 
to as the PhIS analysis.) This analysis was further elaborated by Kim Thompson and Waseem Afzal (2011), 
who added “culture” to the mix, calling the third factor “socio-cultural” access, and arguing that all three 
factors should be considered simultaneously (Thompson & Afzal, 2011, 30). Shannon Oltmann (2009) 
synthesized the PhIS analysis with McCreadie and Rice’s categorization, showing the relationships between 
the two conceptualizations and illustrating how PhIS can incorporate the conceptualizations noted by 
McCreadie and Rice. 


| Physical Access | Social Access | Intellectual Access 


\\ 


Technology \ Commodity N Control Participation 


Figure 1: Synthesis of Two Conceptualizations of Information Access (From Oltmann 2009, 7). 


Each of the proposed accounts of access to information was developed using a different method. McCreadie 
and Rice focused on how the term has been used in the scholarly literature, while Burnett, Jaeger, and 
Thompson developed their theory in a more intuitive manner, validating their theory by showing how it 
can illuminate various case studies. Oltmann (2013) has done further work showing that PhIS can be used 
to analyze the ways that access can be facilitated or restricted in cases involving access to scientific 
information. While each of these accounts has contributed to our understanding of information access, none 
of the existing accounts says what the constituents of access are. Such an account is needed in order to 
diagnose deficiencies in access and to guide the creation of effective interventions to improve access. The 
facets account proposed here fills this gap. 


3 Methods 


The account of information access proposed here was developed using a philosophical method; it uses a 
combination of conceptual analysis and a “standard threats” analysis based on the work of political 
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philosopher Henry Shue. The goal of this analysis is to provide an account of access to answer a particular 
question, i.e., what conditions would need to be fulfilled so that someone’s human right to information 
access is satisfied? How we answer this question depends partly on what it means for someone to have a 
human right. Here I rely on Shue’s (1996) analysis of human rights as protections against standard threats 
to vital interests (e.g., the interest in access to information). Shue’s standard threats approach starts from 
the premise that it is impossible for policy makers to protect against all possible threats to an interest. 
Thus, in developing an account of human rights and corresponding state obligations, the theorist ought to 
focus on those threats that are most likely to arise (typically based on past experience). A standard threats 
analysis of a right, such as the right to information access, proceeds by asking, “What are the standard 
threats to information access?” The answer to this question can then be used to delineate the facets of 
access. The account provided here was initially developed by looking at the threats to the human right to 
health information (Article 19, 2007; Parker et al., 1999; Ramsay, 2001; Warren et al., 2012; Yamey, 2008). 


4 Information Access Defined 


First, it is important to note that “access to information,” does not refer to access to an information system 
or service (i.e., a system or service that organizes and presents information), but access to information itself. 
“Information” as it is used here means, following the philosopher of information Luciano Floridi, semantic 
content (Floridi, 2013). (I differ from Floridi, however, in allowing that information may be false (Fallis, 
2009)). Information may be provided via documents or other information sources (including human beings). 
Simply starting with the dictionary definition, the term “access” has been defined as the “freedom or ability 
to obtain or make use of something” (Merriam-Webster Inc., 2004) and “the right or opportunity to use or 
benefit from something” (Oxford University Press, 2001). These definitions actually capture the core of 
what a right to information access is; thus, I suggest the following definition of information access: 


A person has access to information when he/she has the freedom or opportunity to obtain, make use of, and 


benefit from that information. 


It is not being suggested that this is the only possible definition of access. For different purposes one might, 
for instance, focus only on the ability “obtain” the information, leaving aside questions of whether the 
information can be used or whether the person would be able to benefit from that use. This definition is 
appropriate, however, if what we are concerned with is the right to information. The right to have access 
to health information, for instance, will not be satisfied if a person can obtain, but cannot use or benefit 
from this information. The point of the right to information is that one should be able to gain some sort of 
benefit from it. Note that this does not mean that the person must or will benefit—it is merely that she is 
capable or has the opportunity to benefit. 


4.1 Access is a Relation 


Discussions of access often point out that it does not depend merely on the availability of information or 
information technologies (be they books, computers, or cell-phones), but also on the capacities of individuals 
to effectively use these resources. The basic nature of access as a relation, however, has not been clearly 
articulated. According to the definition given above, access exists when there is at least one person and at 
least on piece of information such that the person is able to obtain, use, and benefit from that piece of 
information. Thus, access is a relation between a person (or group of persons) and a piece (or complex set 
of pieces) of information. The fact that access is a relation between persons and information has important 
consequences. First, as a relation, one can never talk about “access” to information per se, one must be 
clear on who has access. What may be accessible for one person may not be accessible for another. There is 
no “accessible” full stop; information is always accessible for some person(s). Second, since access is a 
relation, to make a piece of information accessible there are two sorts of interventions one can make. One 
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can effect a change the information (or the information system or service)—e.g., make it easier to find, 
easier to understand, easier to verify—or one can effect a change in the person or her environment—e.g., 


teach information seeking skills, reading comprehension, or information literacy. 


5 Facets of Access 


By using a standard threats analysis focusing on access to health information, we find that information may 
be inaccessible due to one or more of the following 5 factors: 


1. The information was not available. For example, persons have a right to know the prevalence of 
contagious diseases or other health threats in their areas. But, governments did not collect this data 
due to a lack of resources, incompetence, or a desire to look good (Article 19, 2007, p. 75).' 

2. The information was not findable. For example, the person seeking information on her health 
problem does not have the skills necessary to find the available information (Warren et al., 2012). 

3. The information was not reachable. For example, important medical information may be behind 
pay walls that physicians from poor countries cannot afford (Yamey, 2008). 

4. The information was not comprehensible. For example, prescription information or labels on 
essential medicines may not be in the local language (Article 19, 2007, p. 30). 

5. The information was not useable. For example, medical information is out of date or simply 
inaccurate (Article 19, 2007, p. 5). 


Putting these 5 facets in positive language, in order for information to be accessible it must be available, 
findable, reachable, comprehensible, and useable. While the feature of availability is almost entirely 
dependent on the production side of the information system, the other four factors—the ability for a person 
to find, reach, comprehend and use information—depend on both the state of the information and the state 
of the person. So, for example, it is possible to make the medical information more reachable either by 
lowering the price or by providing more monetary resources to the physician. Whether an intervention to 
improve access should be on one side of the equation or the other depends on the relative costs and benefits 
of such interventions. 

The terms used above focus on the perspective of the information seeker/user, but if we look at 
these factors from the perspective of the information provider we will find some familiar concepts from LIS. 


Useable Findable 


— —_ 


Figure 2: Facets of Access--User Perspective and Provider Perspective 


1 “Article 19” is a human rights organization based in the UK, which focuses on issues related to freedom of speech and access to 
information. 
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5.1 Integrating the Facets Account and PhIS 


Far from this analysis of access being a replacement for the PhIS analysis, it is perfectly compatible with 
it. While this analysis picks out those factors that constitute whether a piece of information is accessible or 
not, the PhIS analysis allows us to focus on determinants of whether and to what degree that factor is 
satisfied. Thus, for each factor we can ask what are the physical, intellectual, and social determinants of its 
satisfaction? 


Intellectual 
Determinants 
Socio- 
ee Cultural 
Determinants 


Comrehensibility 


Figure 3: PhIS Analysis Applied to Facet 


Combining the two analyses, we can ask 6 questions with regard to the comprehensibility facet: 


1. What physical determinants limit/enable comprehensibility on the side of the 
information/system/service? 
e For example, is the copy poor—e.g., too faint, illegible? 
2. What physical determinants limit/enable comprehensibility on the side of the persons who need 
information? 
e For example, does the person have a visual disability? 
3. What intellectual determinants are limit/enable comprehensibility on the side of the 
information/system/service? 
e For example, is the material written only for subject experts? 
4. What intellectual determinants limit /enable comprehensibility on the side of the persons who need 
information? 
e For example, do they have a literacy deficiency in this area? 
5. What socio-cultural determinants limit/enable comprehensibility on the side of the 
information /system/service? 
e For example, is the material written in a way that is culturally relevant? 
6. What socio-cultural determinants limit/enable comprehensibility on the side of the persons who 
need information? 
e For example, do they have the cultural competence to understand information from or 
about other cultures? 


Similar questions can be developed for each of the facets to provide a rubric for evaluating the accessibility 
of a piece of information or set of information by some user or users. 


6 Conclusion and Future Work 


This note presents a model of information access as a relation between a person and information, constituted 
by five facets—availability, findability, reachability, comprehensibility, and useability. Each of these facets 
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is complex, including a number of factors. Thus, more work needs to be done to characterize each of these 
facets. More generally, work needs to be done to connect this conception of access to related work in LIS, 
such as Lievrouw’s concept of the information environment (Lievrouw, 2004) as well as empirical work on 
information seeker’s conceptualizations of information accessibility (Fidel & Green, 2004). In addition, while 
intuitively the account appears to be applicable to information topics besides health, it will be important 
to test whether it captures all the standard threats to information access across various subject matters. 

Ultimately, the test of this conceptualization of access is whether it is useful for diagnosing access 
deficiencies and designing policies and systems to address them. This semester I am running a small test by 
asking students in my Introduction to Digital Cultures course to use this analysis to diagnose and suggest 
interventions to improve access to information for underserved populations. Each student is focusing on a 
particular underserved user group, an information content area, and a service or system, such as a library, 
Internet service provider, or website. Based on their preliminary research, they were asked to fill out a chart 
based on the model provided above, noting whether access to the information provided by that service or 
system was adequate or deficient. The students will then use this analysis to develop a concrete proposal 
for improving access for their user group and to provide a justification of this proposal. Once the final 
papers have been submitted, I will be analyzing the students’ work to determine if there are any gaps in 
the model, such as constituents or determinants of access that were not captured. 
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Abstract 

Library seating surveys record the use of seats in a library. They estimate library usage and are used to 
plan library spaces for future use. This paper describes a seating survey in an academic library, which 
aggregated data from 112 seat counts to generate heat maps to visualize occupancy. Triangulation of the 
seating survey data with another survey on users’ perceptions of space in the library, revealed an 
interesting contrast between highly-occupied areas that were perceived as quiet, and less occupied areas 
perceived as crowded and noisy. Discussion of this finding is framed in terms of Bennett’s (2009) model 
of a technology-driven paradigm shift in academic libraries from places for solo work to places for group 


learning. 
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1 Introduction 


Evaluating users’ behavior is an important part of library management. One useful evaluation method is 
the seating survey, which records the number and position of occupied seats in a library. Seating surveys 
provide insight into patron needs, library usage, and under-used and over-crowded areas (Loder, 2000). 
They can also provide insight into specific behaviors. For example, students entering a library room will 
seat themselves away from other students (Fishman & Walitt, 1972); students require a spaces ranging 
from carrels for quiet study to tables and collaborative spaces to study with other students (Loder, 2008, 
2010); and seating preference is shaped by users’ mental models of a library and their knowledge of how to 
move within a library in order to find preferred environments (c.f. Mandel, 2010; Van Beynen, 2010). 

The research described in this paper is informed by Bennett’s (2009) model of three library 
architecture paradigms, which describes the relationships between information technologies, user practices, 
and library space. An historical user-centered paradigm is associated with early printing and movable type, 
in which libraries supported users to use a small number of scarce and expensive books. A later book- 
centered paradigm, originating in nineteenth century industrialized paper production and printing, led to 
expansive stacks of physical volumes in academic libraries, about which study areas are arranged. Computer 
technologies are now supporting an emerging learning-centered paradigm, in which users engage in solo and 
group learning with electronic resources. In this third paradigm, book stacks are less visible, and spaces for 
learning and collaboration are moving to the center of library space, for instance in the form of the 
information commons (Beagle, 2000). While a learning-centered paradigm is emerging in academic libraries, 
the physical fabric of many libraries dates from the twentieth century and supports the book-centered 
paradigm. Understanding how to identify and support a new paradigm of academic library use within 
existing building spaces is not a straightforward exercise (Nitecki, 2010). 

To address this issue, the seating survey described in this paper presents initial results from a heat 
map visualization of seating patterns in an academic library. Heat maps are data visualizations that can 
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display the relationships between two sets of data in terms of color, often a spectrum, with lower values 
represented by blue and higher values by red. Heat maps are used to visualize matrix information in the 
biological sciences (Wilkinson & Friendly, 2009). They can also be used to map data values onto spatial 
values, such as road traffic injuries (Hilton et al., 2011), or the gaze of Web site users (Spakov & Miniotas, 
2007; Tullis, 2007). In this study, data from 112 seating surveys were used to generate heat maps of 
occupancy in an academic library (‘the Library’). The survey took place in the context of ongoing work 
aimed at understanding whether the Library supports patrons to accomplish their goals, and what 
innovations patrons would like to see. A team of faculty, Ph.D. students, and a Library employee developed 
this survey. The instruments were refined over a number of iterations. The survey administrators were 
provided with IRB training. 


2 Methods 


The Library was constructed in the 1980s to serve a mid-sized university (‘the University’) in the United 
States. The basement includes several large open plan study areas with movable tables, desks and chairs, 
journals housed in compact storage, classrooms, computer labs, and some small study rooms. The entrance 
level has turnstiles, circulation and reference desks, public computers, DVDs, access to an adjacent fast food 
café, and various tables and chairs. The second floor houses the main stacks, study rooms, different forms 
of seating and tables, and rows of carrels. (A third floor houses a law school library but this is not generally 
accessible to Library patrons). The entrance, second, and third floors are arranged around and connected 
spatially through a large asymmetrical atrium that lets in light through a glass roof. Overall, the Library 
provides for both individual and group study with chairs, tables, carrels, and study rooms. Computer 
terminals and wireless networks provide Internet access, and students can also loan laptops. 

The survey divided each floor of the Library into a series of zones, defined as spaces that felt 
coherent in terms of furniture, activity, etc. (Figure 1). Overall, seventy-six zones were identified. Zones 
ranged in size from two small tables and four chairs, to an open area inside the library entrance, with seven 
tables, thirty-five assorted chairs, and other furniture. A symbol key was developed to describe the furniture 
within a zone (Figure 2). The zones on each floor were given numerical identifiers that supported a 
sequential ‘sweep’ of the Library (Figure 3) (c.f. Given and Leckie, 2003). A number of pilot surveys were 
carried out, before the survey instrument was administered by a Library staff member, who walked the 
Library and recorded all seated users. A walk-through typically took about 45 minutes. A total of 112 
surveys were carried out. The resulting seat counts were entered into a spreadsheet. 

The average occupancy of each of zone was calculated as a percentage as follows: 


average recorded occupancy 


- - x 100 = average occupancy rate 
maximum potential occupancy 


Thus, a zone with five seats and an average occupancy of 1.5 users, was recorded as having an average 
occupancy rate of 30%. The percentage values were then converted to an RGB color value, ranging from 
red (255, 0, 0), representing 100% occupancy, to blue (0, 0, 255), representing 0% occupancy. For instance, 
an occupancy rate of 30% generates a rounded RGB value of (77, 0, 179). RGB values were calculated for 
each of the seventy-six zones in the Libary, and a plan of each floor was overlaid with colored shapes for 
each zone based on the RGB values. The result was a series of heat maps of Library occupancy (see Figures 
4, 5, and 6). 


3 Results 


The heat maps provided a visual synopsis of overall occupancy patterns in the Library. Frequently occupied 
zones appeared as islands of orange or red in otherwise blue and green settings. The initial visualization 
made it easy to identify areas in the Library that were more or less crowded than others. For instance, the 
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hexagonal workstations on the entrance level appeared as highly occupied (Figure 5), as did the carrels on 
the second floor (Figure 6), while the basement appeared to have a relatively low occupancy (Figure 4). 
These findings correlated with other occupancy studies carried out in the Library. 

These results were generally expected. When the heat maps were correlated with a face-to-face survey, 
which had interviewed ninety-eight users about their perceptions of the Library as a place to study (Khoo 
et al., 2013), some of the findings were confirmed. In one example, a zone with high occupancy in Figure 4 
is that of the Reference Hub on the entrance level, which has six hexagonal workstations, each with six 
computer carrels. The Hub is used for quick work, checking email, accessing the Web, etc., and the space 
survey often recorded users standing around seated users of the workstations. Face-to-face survey responses 
understandably described this area as “Normally filled with students,” and “Convenient to access, but often 
loud and crowded. Hard to find a computer.” Contrary to initial expectations, however, there was sometimes 
no direct correlation between occupancy and perceptions of noise. One area with high occupancy was the 
second floor carrels, along the walls of the library between the stacks and the windows, with power outlets 
built into their frames (Figure 3). These were also perceived as frequently occupied, but quiet; comments 
included “Some of the quietest and most relaxing spots,” “Nice place for quiet study and get books to read,” 
and “Use when CANNOT be distracted. This is my hiding area.” Conversely, there were areas in the Library 
which were less occupied, but which were also perceived to be noisy and crowded. One example here is that 
of the basement, an area that includes several open areas with multiple tables and chairs (Figure 4). 
According to the heat map, many parts of the basement had average occupancy rates of 20-40%. However, 
when triangulated with the face-to-face survey data, a different picture emerges, that of a more crowded 
area. The group study areas in particular, while popular with some students, also drew negative comments 
from others: “Too many people”, “Don’t use because of congestion”, and “Avoid. Noisy. Kids playing 
around” were typical comments about this space. These later findings suggest the need for further research. 


Entrance level, Zone 12 


carrel laptop chair 


() 0) 


() () 
() 0) () 0) (—) upholstered ae ee 


front front 
Figure 1: An example of a zone plan, showing a two 


tables with chairs, and a photocopier. Figure 2: Coding key with different seating types. 
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Figure 3: Example floor plan, with zones arranged in order. In this case, the ‘sweep’ would follow a roughly 


anti-clockwise direction from the front entrance. 
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Figure 4: Prototype heat map for the basement level. Relatively high occupancy areas are in red, 
relatively low occupancy areas in green/blue. 
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Figure 5: Prototype heat map for the entrance level of the Library. Relatively high occupancy areas are in 
red, relatively low occupancy areas in green/blue. 
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Figure 6: Prototype heat map for the second level of the Library. Relatively high occupancy areas are in 
red, relatively low occupancy areas in green/blue. 


4 Discussion 


At the start of the surveys, it was expected that the more occupied spaces from the seating survey would 
be described as busy in the face-to-face survey, and less occupied spaces would be described as quiet. 
However, the data sometimes showed the reverse, with some highly occupied areas perceived as quiet, and 
other less occupied areas perceived as crowded and noisy. 

This finding prompted reflection on how users perceive occupancy in different seating contexts. For 
instance, the second floor carrels were often fully occupied, but perceived to be quiet. A carrel has clear 
boundaries, and it is easy to see (and to record) whether it is fully occupied or not. In Bennett’s terms, 
carrels are a ‘second paradigm’ form of library space, dedicated to quiet solo work, and arranged around 
physical books. Conversely, in the basement, while the occupancy of the seats and tables used for group 
work was recorded at approximately 25%-40% — which is about half the occupancy rate of the second floor 
carrels — Library users also regarded this area as much more crowded, noisy, and busy. How can an area 
with a 25%-40% occupancy rate be perceived as busy? Gibbons and Foster (2007) note that in the case of 
table space for group work, an eight-seat table is considered full by students if there are four or five students 
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sitting there, with laptops, notebooks, textbooks, cell phones, beverages, and other paraphernalia. A group 
study area containing large tables and one hundred chairs might therefore be perceived by users to be ‘full’ 
when only approximately 50% of the seats at each table (and 50 seats overall) are occupied. This suggests 
that practical occupancy limits for open plan group study spaces could be significantly lower than the 
theoretical maximum seating. As a thought experiment, if the group work areas in the basement are 
perceived by users to be fully occupied when only 50% of seats are actually taken, then the perceived 
average occupancy rate would doubled from 25-40% to 50-80%, and would be colored green/yellow/orange 
rather than blue/green in Figure 4). 

It could be argued that these results are peculiar to the library that was studied. To begin thinking 
about how the results might be generalizable, it is useful to return Bennett’s (2009) model of three library 
space paradigms. In the survey data, and particularly the triangulated data, there is evidence of both second 
and third paradigm spaces in the Library. In second paradigm terms, there are (quiet) solo carrels placed 
around the stacks on the second floor; and in third paradigm terms, there are (noisy) group study spaces 
in the open plan basement. In each space, the dynamics between the occupancy levels and users’ sense of 
place varied in complex ways. 

What is generalizable from this research so far is not so much the claims regarding specific 
dimensions of users’ interactions with carrels, group spaces, and so on — although from anecdotal feedback 
to an earlier version of this paper, it might be assumed that similar phenomena would be observed in other 
library settings — but (a) that a general historical model of library architectonics is a useful one to approach 
library planning, and (b) that there is therefore a need for more fine-grained and distinct occupancy models 
that can probe the interactions between technology, pedagogy, and solo and group student work. This 
suggests in turn the need for a rethinking of some models and methods for assessing and managing library 
buildings. The relationship between what appears to have been a relatively straightforward positive 
correlation between density of occupation, and affective dimensions of place such as noise and crowdedness, 
are challenged by the research findings, and suggest the need for libraries to think about these phenomena 
and the relationships between them in new ways. Particularly, the results suggest that a number of library 
metrics models might be specific to second paradigm buildings and spaces, and therefore that new third 
paradigm models need to be developed. While there is plenty of anecdotal evidence for many dimensions of 
such change, gaining a systematic understanding the sociotechnical nuances involved — for instance 
understanding how students use reconfigurable group spaces to support technology use in groups, and judge 
whether a space is ‘full’ or not — remains a an ongoing task. 


5 Conclusion 


This paper has introduced a heat map method for visualizing library seating occupancy. The method proved 
useful in visualizing the busy and quiet areas of the library. When combined with other face-to-face survey 
data gathered by the Library, a complex dynamic was identified between occupancy levels, and perceptions 
of noise, quiet, and occupancy. There is a need to assess in more detail how changes in technology use by 
students are impacting library space planning and management. These questions will be explored in future 


work. 
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Abstract 

Prototypical instances of disinformation include deceptive advertising (in business and in politics), 
government propaganda, doctored photographs, forged documents, fake maps, internet frauds, fake 
websites, and manipulated Wikipedia entries. Disinformation can cause significant harm if people are 
misled by it. In order to address this critical threat to information quality, we first need to understand 
exactly what disinformation is. After surveying the various analyses of this concept that have been 
proposed by philosophers and information scientists, I argue that disinformation is misleading information 
that has the function of misleading. 
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1 Introduction 


Prototypical instances of disinformation include deceptive advertising (in business and in politics), 
government propaganda, doctored photographs, forged documents, fake maps, internet frauds, fake 
websites, and manipulated Wikipedia entries. Disinformation can be extremely dangerous. When people are 
misled about important topics, such as investment opportunities, medical treatments, or political 
candidates, it can cause serious emotional, financial, and even physical harm. 

Inaccurate information (or misinformation) can mislead people whether it results from an honest 
mistake, negligence, unconscious bias, or (as in the case of disinformation) intentional deception. But 
disinformation is particularly dangerous because it is no accident that people are misled. Disinformation 
comes from someone who is actively engaged in an attempt to mislead. Thus, developing strategies for 
dealing with this threat to information quality is a particularly pressing issue for information science (see 
Hernon 1995, Lynch 2001, Piper 2002, Walsh 2010, Rubin & Conroy 2012, Karlova & Fisher 2013). 

In order to develop such strategies, we first need to improve our understanding of the nature and 
scope of disinformation. Toward this end, several philosophers and information scientists (e.g., Floridi 1996, 
Fetzer 2004, Floridi 2005, Fallis 2009, Floridi 2011) have offered analyses of the concept of disinformation. 
In this note, I provide counter-examples to show that all of these analyses either (a) exclude important 
forms of disinformation and/or (b) include innocuous forms of information that should not be counted as 


disinformation. I then propose and defend a new functional analysis of the concept of disinformation. 
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2 Previous Analyses of Disinformation 


2.1 Floridi 1996 


In one of the earliest discussions of the concept of disinformation, the philosopher Luciano Floridi (1996, 
509) claimed that “disinformation arises whenever the process of information is defective.” However, this 
analysis is too broad. When someone makes an honest mistake, like The Chicago Tribune reporting that 
“Dewey Defeats Truman,” something in the process is defective. But such accidental falsehoods clearly are 
not disinformation. While they can certainly be misleading, it is only an accident if people are misled. The 
source of the information does not intend to mislead anyone (and does not benefit from people being misled). 


2.2 Floridi 2005 


Several years later, Floridi (2005, §3.2.3) claimed that “when semantic content is false, this is a case of 
misinformation .. And if the source of misinformation is aware of its nature, one may speak of 
disinformation.” In other words, disinformation is inaccurate information that the source knows to be 
inaccurate. This analysis repairs the shortcoming with Floridi’s 1996 analysis. It does not count accidental 
falsehoods as disinformation. When they ran the “Dewey Defeats Truman” story, the editors of The Chicago 
Tribune were not aware that the story was false. 

However, Floridi’s 2005 analysis is also too broad. For instance, when you tell someone a joke or 
speak sarcastically, you are aware that what you are saying is false. But you are not spreading 
disinformation. Even though jokes and sarcastic comments are false, they are not misleading. The person 
to whom you are speaking is also aware that what you are saying is false. 


2.3 Fetzer 2004 


The philosopher James Fetzer (2004, 231) claims that disinformation “should be viewed more or less on a 
par with acts of lying. Indeed, the parallel with lying appears to be fairly precise.” In other words, 
disinformation is a statement that the speaker believes to be false and that is intended to mislead. This 
analysis repairs the shortcoming with Floridi’s 2005 analysis. For instance, it does not count jokes and 
sarcastic comments as disinformation. Jokes and sarcastic comments are not lies because they are not 
intended to mislead (see Mahon 2008, §1.4). 

However, Fetzer’s analysis is also too broad. Someone who intends to spread disinformation with a 
lie might not succeed in doing so. Even though she believes that what she says is false, it might actually 
(unbeknownst to her) be true (see Mahon 2008, §1.2). While such accidental truths are lies, they are not 
disinformation because they are not actually misleading. 

In addition, and more importantly, this analysis is too narrow. Lies are linguistic expressions, such 
as “A wolf is chasing my sheep!” (see Mahon 2008, §1.1). However, doctored photographs and falsified maps 
are also prototypical instances of disinformation. It is no accident when people are misled by such visual 
disinformation because that is precisely what the source of the information intended. 


2.4 Floridi 2011 


In his most recent discussion of the concept of disinformation, Floridi (2011, 260) claims that 
“misinformation is ‘well-formed and meaningful data (i.e. semantic content) that is false.’ ‘Disinformation’ 
is simply misinformation purposefully conveyed to mislead the receiver into believing that it is information.” 
In other words, disinformation is inaccurate information that the source intends to mislead the recipient. 
This analysis is along the same lines as Fetzer’s analysis, but it repairs the shortcomings with 
Fetzer’s analysis. First, since Floridi explicitly requires that disinformation be false, accidental truths do 
not count as disinformation. Second, although he tends to focus on propositional information in his work, 
Floridi (2011, 84) allows that images and maps count as information. Thus, visual disinformation counts as 


disinformation on Floridi’s 2011 analysis. 
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However, Floridi’s 2011 analysis is also too broad. Even if she says something that actually is 
inaccurate, someone who intends to spread disinformation still might not succeed in doing so. For instance, 
even though they are (unrealistically) intended to be misleading, implausible lies are not disinformation 
because they are not actually misleading. 

In addition, and more importantly, this analysis is too narrow. Although prototypical instances of 
disinformation are inaccurate, disinformation can sometimes be accurate. For instance, politicians often use 
spin to mislead the public (i.e., they selectively emphasize only certain facts). Like prototypical instances 
of disinformation, such true disinformation is intentionally misleading and it poses a similar risk of harm 
to the recipient. 

In fact, there is another respect in which Floridi’s 2011 analysis is too narrow. Although 
disinformation is always misleading, it is not always intended to mislead. For instance, inaccurate 
information has been intentionally placed on the internet for purposes of education and research (see Hernon 
1995, Piper 2002, 19). A fake website advertising a town in Minnesota as a tropical paradise was created to 
teach people how to identify inaccurate information on the internet. In such cases, while the educators and 
researchers certainly foresee that people might be misled by their inaccurate information, they do not intend 
that anybody actually be misled. Even so, such side effect disinformation probably should count as 
disinformation. Just as with prototypical instances of disinformation, it is no accident when people are 
misled. Although the educators and researchers do not intend to mislead anyone, they do intend their 
inaccurate information to be misleading. For instance, a fake website would not be a very effective tool for 
teaching people how to identify inaccurate information on the internet if it was clear to everyone that it 
was a fake. 


2.5 Fallis 2009 


According to the information scientist Don Fallis (2009, §5), disinformation is “misleading information that 
is intended to be (or at least foreseen to be) misleading.” This analysis repairs the shortcomings with 
Floridi’s 2011 analysis. First, since Fallis explicitly requires that disinformation be misleading, implausible 
lies do not count as disinformation. Second, since Fallis does not require that disinformation be inaccurate, 
true disinformation counts as disinformation. Third, Fallis does not require that disinformation be intended 
to mislead. The source of the information merely has to foresee that it is likely to mislead. Although the 
educators and researchers described above do not intend to mislead anyone, they do foresee that some 
people may be misled. Thus, side effect disinformation counts as disinformation on Fallis’s analysis. 

However, Fallis’s analysis is too broad. In addition to side effect disinformation, it also counts some 
subtle forms of humor as disinformation. For instance, a significant number of people (including a few 
serious journalists) are actually misled by the satirical stories published in The Onion (see Fallon 2012). 
Moreover, since the editors of The Onion are clearly aware that this sort of thing is going on, they do 
foresee (even if they do not intend) that at least some people will be misled by the stories that they publish. 

Fallis’s analysis can easily be modified though so that it does not count satire as disinformation. 
We can simply leave off the “foreseen to be misleading” clause and say that disinformation is misleading 
information that is intended to be misleading. This modified analysis still counts side effect disinformation 
as disinformation. For instance, as noted above, although educators do not intend to mislead anyone with 
their fake websites, they do intend these websites to be misleading. 

However, Fallis’s analysis is also too narrow. Even when a source of information does not intend to 
mislead anyone and does not foresee that anyone will be misled, it may be no accident that the information 
is misleading. For instance, many of the people who disseminate conspiracy theories (e.g., that the President 
was not born in the United States or that the United States government was behind the 9/11 terrorists 
attacks) believe that what they are saying is true. Thus, they do not intend to mislead anyone, or foresee 
that anyone will be misled, by what they say. Even so, these false claims can mislead people and it is no 
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accident that people are misled. There is a mechanism that reinforces the dissemination of these false claims. 
For instance, certain websites and media outlets attract more readers and viewers by promoting these false 
claims.' Like prototypical instances of disinformation, such adaptive disinformation is not misleading by 
accident and it poses a similar risk of harm to the recipient. 


2.6 Skyrms 2010 


Recent work in biology on deceptive signaling in animals might provide an analysis of the concept of 
disinformation. According to the philosopher Brian Skyrms (2010, 80), “if misinformation is sent 
systematically and benefits the sender at the expense of the receiver, we will not shrink from following the 
biological literature in calling it deception.” Although Skyrms and the biologists that he cites use the term 
‘deceptive signal’ rather than the term ‘disinformation’, they are trying to capture essentially the same 
concept. Thus, we might say that disinformation is misleading information that systematically benefits the 
source at the expense of the recipient. This analysis repairs the shortcoming with Fallis’s analysis. Although 
people who disseminate conspiracy theories may not intend to mislead others, they do systematically benefit 
from others being misled. Thus, adaptive disinformation counts as disinformation on Skyrms’s analysis. 

However, Skyrms’s analysis is too narrow. Most of the time, disinformation imposes a cost on the 
recipient, as when the villagers waste their time running to the shepherd boy’s aid. However, disinformation 
need not always impose a cost on the recipient. In fact, it is sometimes intended to benefit the recipient. 
For instance, when a friend asks you how he or she looks, you might very well say, “You look great!” even 
if this is not true in order to spare his or her feelings. Admittedly, such altruistic disinformation does not 
pose the same risk of harm to the recipient that prototypical instances of disinformation do. But like 
prototypical instances, altruistic disinformation can be intentionally misleading. 

Skyrms’s analysis can easily be modified, however, so that it counts altruistic disinformation as 
disinformation. We can simply leave off the “at the expense of the recipient” clause and say that 
disinformation is misleading information that systematically benefits the source. It is really just the 
“systematic benefit to the source” clause that is needed to count adaptive disinformation as disinformation.’ 

However, Skyrms’s analysis is still too narrow because it rules out the possibility of disinformation 
that does not benefit the source. Most of the time, disinformation does systematically benefit the source. 
However, it need not always do so. For instance, in order to avoid embarrassment, people often lie to their 
doctors about their diet, about how much they exercise, or about what medications they are taking (see 
Reddy 2013). If their doctors are misled, it can lead to incorrect treatment recommendations that can harm 
the patient. Admittedly, such detrimental disinformation may not pose the same risk of harm to the 
recipient that prototypical instances of disinformation do. But like prototypical instances, detrimental 
disinformation is intentionally misleading. 


Clearly Clearly Not 

Disinformation Disinformation 

Malicious Lies (ML) non-accidentally Truthful Statements not misleading 
misleading (TS) 

Visual Disinformation non-accidentally Accidental Falsehoods only accidentally 

(VD) misleading (AF) misleading 

True Disinformation non-accidentally Jokes (J) not misleading 

(TD) misleading 


1 Jf the editors of The Onion do not just foresee that some readers will be misled, but actually benefit from this happening, then their 
stories probably should count as disinformation. 

2 In fact, Skyrms (2010, 76) himself notes that his analysis might be modified in this way. He just failed to see that this sort of 
modification was actually necessary. 
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Side Effect non-accidentally Sarcastic Comments not misleading 
Disinformation (SE) misleading (SC) 

Adaptive Disinformation non-accidentally Accidental Truths (AT) not misleading 
(AD) misleading 

Altruistic non-accidentally Implausible Lies (IL) not misleading 
Disinformation (AL) misleading 

Detrimental non-accidentally Satire (S) only accidentally 
Disinformation (DE) misleading misleading 


Table 1: Counter-examples to Previous Analyses of Disinformation 


3 A New Analysis of Disinformation 


Even though the modified Fallis analysis (in terms of an intention to be misleading) and the modified 
Skyrms analysis (in terms of a systematic benefit from being misleading) are too narrow, together they 
arguably capture all instances of disinformation. It would be unfortunate though if we had to resort to such 
a disjunctive analysis of disinformation. If an analysis requires two independent criteria, it suggests that we 
are really dealing with two separate phenomena (see Kingsbury & McKeown-Green 2009, 578-81). However, 
there is something that unifies all of the cases of disinformation discussed above. Disinformation is 
misleading information that has the function of misleading someone. 

Roughly speaking, a function is “the action for which a person or thing is particularly fitted or 
employed” (American Heritage 2000). For instance, the function of a heart is to pump blood. Also, the 
function of a chair is to be sat upon. According to this analysis, the distinguishing feature of disinformation 
is that its function is to mislead people. 

It should be noted that there are at least two different ways that something might acquire a function 
(cf. Graham 2010, 153-55). For instance, a heart has the function of pumping blood because that is what it 
evolved to do. By contrast, a chair has the function of being sat upon because that is what it was designed 
to do. In other words, the designer of the artifact intended it to have that function. 

Disinformation can acquire the function of misleading people in either of these two ways. Most 
forms of disinformation, such as lies and propaganda, are misleading because the source intends the 
information to be misleading. But other forms of disinformation, such as conspiracy theories, are misleading 
simply because the source systematically benefits from their being misleading. Even though they might 
differ in terms of how that function was acquired, all instances of disinformation are unified by the fact that 
they have a certain function. And however that function was acquired, it is no accident that the information 
is misleading. 

This analysis of disinformation does not seem to be too narrow. For instance, the adaptive 
disinformation that caused a problem for the modified Fallis analysis has the function of misleading (because 
the source systematically benefits from it being misleading). Also, the detrimental disinformation that 
caused a problem for the modified Skyrms analysis has the function of misleading (because the source 
intends it to be misleading). 

In addition, this analysis does not seem to be too broad. It does not count as disinformation any of 
the cases discussed above that are not disinformation. For instance, accidental truths and implausible lies 
are not misleading. Also, while accidental falsehoods and satire can sometimes be misleading, the source of 
the information does not intend to mislead people nor does she systematically benefit from people being 
misled. If people are misled, it is just an accident.’ 


3 As it stands, this functional analysis handles the cases discussed in this note. But there are some further complications that will be 
addressed in the full paper. For instance, a piece of information is often addressed to a large audience, and it may have the function 
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Clearly Disinformation Clearly Not Disinformation 
ML VD TD SE AD AL DE TS AF J SC AT IL S 
Floridi 1996 i f 4 4 74 Mood a a a 
Floridi 2005 rl ri i tod vod vod 
Fetzer 2004 f i i i f 
Floridi 2011 J i tod a 
Fallis 2009 i i tod 7 4 a 
Fallis modified / f f i i i 
Skyrms 2010 f f i i f 
Skyrms 
modified J J i io i 
Functional 
Anal. J i i i oa i i 


Table 2: Analyses of Disinformation and Counter-examples 


4 Conclusion 


Disinformation can cause significant harm if people are misled by it. But in order to address this critical 
threat to information quality, we first need to understand exactly what disinformation is.* After surveying 
the various analyses that have been proposed, I have argued that disinformation is misleading information 
that has the function of misleading. 


5 References 


American Heritage. (2000). Dictionary of the English language, Fourth edition. Boston: Houghton Mifflin. 

Fallis, D. (2009). A conceptual analysis of disinformation. Proceedings of the iConference. 
http: //ischools.org/images/iConferences/fallis_ disinfol.pdf 

Fallon, K. (2012). Fooled by ‘The Onion’: 9 most embarrassing fails. The Daily Beast. 
http: //www.thedailybeast.com/articles/2012/09/29 /fooled-by-the-onion-8-most-embarrassing- 
fails. html 

Fetzer, J. H. (2004). Disinformation: The use of false information. Minds and Machines 14, 231-40. 

Floridi, L. (1996). Brave.net.world: The internet as a disinformation superhighway? Electronic Library 14, 
509-14. 

Floridi, L. (2005). Semantic conceptions of information. Stanford Encyclopedia of Philosophy. 
http://plato.stanford.edu/entries /information-semantic/ 

Floridi, L. (2011). The philosophy of information. Oxford: Oxford University Press. 

Graham, P. (2010). Testimonial entitlement and the function of comprehension. In A. Haddock, A. Miller, 
& D. Pritchard (Eds.), Social epistemology (pp. 148-74). Oxford: Oxford University Press. 

Hernon, P. (1995). Disinformation and misinformation through the internet: Findings of an exploratory 
study. Government Information Quarterly 12, 133-39. 


of misleading only part of that audience. In addition, although it is conceptual progress to see how the concept of disinformation is 
related to the concept of function, a more detailed analysis of what a function is will ultimately be required. 

1 The conceptual work in this note helps us to address this threat to information quality simply by identifying several important types 
of disinformation that we need to be aware of. In the full paper, I will discuss further how this functional analysis might help us to 


detect disinformation and to deter its spread. 


626 


iConference 2014 Don Fallis 


Karlova, N. A., & Fisher, K. E. (2013). A social diffusion model of misinformation and disinformation for 
understanding human information behavior. Information Research 18. http://informationr.net /ir/18- 
1/paper573.html 

Kingsbury, J., & McKeown-Green, J. (2009). Definitions: Does disjunction mean dysfunction? Journal of 
Philosophy 106, 568-85. 

Lynch, C. A. (2001). When documents deceive: Trust and provenance as new factors for information 
retrieval in a tangled web. Journal of the American Society for Information Science and Technology 
52, 12-17. 

Mahon, J. E. (2008). The definition of lying and deception. Stanford Encyclopedia of Philosophy. 
http://plato.stanford.edu/entries/lying-definition/ 

Piper, P. S. (2002). Web hoaxes, counterfeit sites, and other spurious information on the internet. In A. 
P. Mintz (Ed.), Web of Deception (pp. 1-22). Medford, New Jersey: Information Today.p 

Reddy, S. (2013). 'I don't smoke, doc,' and other patient lies. Wall Street Journal. 
http://online.wsj.com/article/SB100014241278873234780045783065 10461212692. html 

Rubin V. L., & Conroy, N. (2012). Discerning truth from deception: Human judgments and automation 
efforts. First Monday 17. http://firstmonday.org/ojs/index.php/fm/article/view /3933/3170 

Skyrms, B. (2010). Signals. New York: Oxford University Press. 

Walsh, J. (2010). Librarians and controlling disinformation: Is multi-literacy instruction the answer? 
Library Review 59, 498-511. 


6 Table of Tables 


Table 1: Counter-examples to Previous Analyses of Disinformation ...........cccceeeeseeceeeneeecesnteeeeesnneeeesenaes 625 
Table 2: Analyses of Disinformation and Counter-examples ..........ccccceessececeentececesnieeeeesnneeeeesneeeeesnneeeeeens 626 


627 


The Geography of Censorship: Communities, Challengers, and Harry Potter 


Emily J. M. Knox! 


1 University of Illinois at Urbana-Champaign 


Abstract 

The Harry Potter series was one of the most censored books when it was first published. Through the 
use of the census and other publicly available data attempts to answer the following questions using the 
Harry Potter series as a case study: Is the perception of the pervasiveness of challenges accurate? Are 
there any commonalities among communities that experience challenges? Are some types of communities 
more prone to challenges to others? What are the characteristics that might unite these communities? 
The paper investigates the commonalities and differences among 23 communities that experienced chal- 
lenges to Harry Potter during the years 1999-2007. 
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1 Introduction 


In the United States, challenges to materials in public institutions are fairly common. Although the data 
on challenges demonstrate that they happen in communities of all types across the country, when the 
general public hears about such incidents, they tend to believe that they are solely a Bible Belt or small 
town phenomenon. Researchers often state that the ubiquity of challenges is one of the most perplexing 
aspects of working in a library—one never knows when a book will be challenged or which will book will be 
targeted. The American Library Association, one of the primary aggregators of book challenge statistics, 
noted that 463 challenges were reported in 2012. Although the association does not release geographic 
information for the challenges, its materials on book challenges do note that, since statistics were first 
collected in 1982, requests for removal or relocation have taken place in all 50 states. 

The investigative framework for this paper is based on an article written by Barbara Luebke (2000) 
in response to Douglas Archer’s work on religion and intellectual freedom. Luebke notes that she views 
religious censorship as essentially a community issue. There are residents of her small town in Indiana who 
move to her location because they believe that it is conservative. Often these new community members are 
surprised to find out that the community is much more heterogeneous than they expected. Luebke writes 
that this group tends to be “very vocal about what they believe should be acceptable to everyone” (Luebke, 
2000, para. 2). It is possible that this insider/outsider narrative and the perception that a given community 
does not live up to particular expectations may partially explain the justifications that are given for many 
challenges. This paper argues that, in essence, community change drives complaints regarding materials in 
the library. 

This paper, which is concerned with the communities in which challenges take place, attempts to 
answer several questions concerning the ubiquity and pervasiveness of challenge cases. These questions 
include the following: Is the perception of the pervasiveness of challenges accurate? Are there any common- 
alities among communities that experience challenges? Are some types of communities more prone to chal- 
lenges to others? What are the characteristics that might unite these communities? In order to address 
these questions, this paper discusses challenges to the Harry Potter series and explores the communities in 
which these incidents took place. It delineates some of the communities’ uniting characteristics and also 
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discusses how these communities differ. It is hoped that by better understanding the geography of censor- 
ship, librarians and other information professionals can be better prepared for any book challenges that 
might occur in their institutions. 

This study constitutes a pilot study for a larger research project and was conducted in order to test 
the methodology and viability of the project. The larger research endeavor will focus on all publically 
available challenges cases within a given time frame and will include demographic and other information 
from the U.S. Census and the General Social Survey along with data from interviews and/or focus groups. 


2 On Challengers 


Although as noted above, challengers come from all demographic groups, there is one defining characteristic 
that unites them — challengers tend to be parents. In fact, as James LaRue notes, they tend to be parents 
with children between the ages of 4-6 and 14-16 (LaRue, 2007, p. 71). In previous work, the author of this 
study found that challengers also tend to share certain worldviews including a fear that contemporary 
society is disintegrating, concern regarding the protection of children’s innocence, and a common sense 
interpretive strategy that they apply to all texts (Knox, 2012). However, there has been very little research 
that systematically locates and describes the various locations and communities in which book challenges 
take place. 

One exception is the 2009 Mapping Banned Books Project (http://civic.mit.edu/blog/petey /map- 
ping-banned-books-2012), a joint project between the American Library Association, the MIT Center for 
Civic Media, and the National Coalition Against Censorship, the map uses publically available data to 
locate where challenges have taken place in the United States. The creator of the map, Chris Peterson, 
notes that the map demonstrates how geographically dispersed challenges are and how much is still not 
known about challenges cases. The map was updated in 2011 (http://goo.gl/maps/pWké6). 

There has also been work on communities that have experienced challenges and censorship. For 
example, Louise S. Robbins’s the Dismissal of Miss Ruth Brown (2001) focuses on a public librarian in 
Bartlesville, Oklahoma who, in the 1950s, was accused of being a communist sympathizer. Although the 
case was ostensibly about communist planted propaganda that was planted in the library, Robbins notes 
that members of the community were actually concerned about Brown’s work for racial equality. Another 
work that focuses on censorship and communities is Shirley A. Wiegand and Wayne A. Wiegand’s Books 
on Trial (2007). Like Robbins’s book, it also centers on the presence of communist works in Oklahoma. 
Wiegand and Wiegand detail a local raid on a progressive bookstore and its implications for the community. 


3 Challengers to Harry Potter 


J.K. Rowling’s Harry Potter series is the focus of this paper. First published in the United States in 1998, 
the series was the target for many challenges. The author chose Harry Potter for this paper because it 
allows the reason for the challenge to remain constant while exploring the different communities in which 
the challenges took place. In almost all cases, the books were challenged because they included the theme 
of witchcraft. According to Robert P. Doyle’s Banned Books (2010), between 1999 and 2007 there were 
approximately 50 challenges to the individual books in the Harry Potter series. Reasons for banning also 
included themes such as death, killing, drinking animal blood, the occult, or satanic deception. However, in 
almost all cases, witchcraft is also mentioned. 

Although it is not the focus of this paper, one might surmise that most of the challenges to the 
Harry Potter series were brought by evangelical and fundamentalist Christians. It should be noted that 
though these terms are used interchangeably by the general public, they do not describe people with the 
same set of religious beliefs. Evangelicals are generally understood to be Christians whose theology empha- 
sizes a personal relationship with Jesus. They may or may not right wing in their orientation. Fundamen- 
talists, on the other hand, are often considered to be a subset of evangelicals and believe in the inerrancy 
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of the bible, are hostile to modern theology, and are often critical of authenticity of other Christians (Barr, 
1984) . However, even with these theological differences, both of these groups would be concerned with the 
occult aspects of Harry Potter and the mass consumption of such problematic material. 

There have been several investigations into censorship and the Harry Potter series. These tend to 
be essays that attempt to understand what challengers mean by the term “witchcraft.” Peter H. Denton 
(2002), for example, notes that concern over witchcraft is really more of a deflection for challengers and 
that they are much more concerned about the popularity of the books. Denton notes that “the real issue 
here isn’t witches, its culture, and more specifically, popular culture. The problem isn’t so much Harry 
Potter, as Harry Potter books, which are ending up on far more bookshelves in children’s bedrooms than 
Mandy Goes to Bible Camp” (Denton, 2002, p. 29). Although challengers are concerned with exposing 
children to the occult, this fear is often encompassed with concerns regarding how children are raised in 
contemporary society. Denton does note, however, that Rowling’s universe does not necessarily match a 
Christian moral universe in that sometimes there are not equivalent consequences for certain behaviors. 
That is, bad behavior does not always lead to punishment and vice versa. 

Another author, Perry L. Glanzer (2004), takes higher-level view and argues that controversies over 
Harry Potter were actually over clashing worldviews. He argues that there are some individuals in contem- 
porary society who are truly concerned with the themes found in Harry Potter and argues that teachers 
should take their concerns seriously. In particular, Perry argues that educators should respond to these 
challenges without hypocrisy by acknowledging the power of books and exposing children to many different 
worldviews throughout the course of their education. 

Similar to Denton, in her article on the social context of Harry Potter challenges, Amanda Cockrell 
(2006) argues that popularity might be one explanation for people’s uneasiness with the Harry Potter series. 
She also notes that fundamentalist Christians might be more apt to blame the occult for any poor behavior 
that they observe in their children. Cockrell quotes a study which found that fundamentalist parents tended 
to equate fantasy with deceit. Cockrell argues that Harry Potter is controversial because the series’ universe 
is rooted in the real world and because Rowling is a skilled parodist of fundamentalists. Regarding the first 
point, Cockrell states that Rowling’s vision of witchcraft actually matches how challengers understand 
witchcraft to operate. For fundamentalist Christians witchcraft is “like angels or the voice of Satan, it is 
out there, unseen but ready to swallow up the hapless child who can be turned toward its seductive allure, 
and that it actually works” (Cockrell, 2006, p. 26). Although we cannot see it, Rowling’s witchcraft exists 
in our own world, an idea that ties well with certain understandings of the occult. Cockrell also argues that 
the Dursleys, Harry Potter’s aunt and uncle who adopt him after the death of his parents, are a caricature 
religious belief. “Coarse, pragmatic materialists the Dursleys are but medieval they also are. They believe 
in magic, and therefore fear it deeply” (Cockrell, 2006, p. 28). The Dursleys, who are portrayed as oafs 
throughout the series, are like those who challenge the Harry Potter series—they also believe that magic 
exists. As with the previous explanation, Cockrell notes that Rowling’s portrayal seems to be too real for 
some in our society. 

These studies are exemplars of the work on the people who challenged Harry Potter. As noted 
above, these studies did not focus on the various characteristics that defined the communities in which 
these challenges took place. This paper attempts to explore the traits that unite or divide such communities. 


4 Method 


All information for this paper comes from Robert P. Doyle’s 2010 bibliography, Banned Books: Challenging 
Our Freedom to Read. The challenges to books in the Harry Potter series took place between 1999 and 
2007. Only the first five books of the series are included in the bibliography. In many cases several books 
in the series were challenged at one time. 
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The author used Census.gov to find county information for cities that were listed. Counties were 
used as the unit of analysis in order to maintain consistency across all of the cases. From an initial list of 
approximately 50 challenges, 23 counties were included in the study (Appendix A).? 

In order to better understand the demographic makeup of a particular community, the following 
information was collected for each county and will be described in more detail below: 1. Total population 
for the years 1990, 2000, and 2010 2. Percentage of population identifying itself as white for 1990, 2000, and 
2010 3. Median Income for the years 1989, 1999 and 2005-2009 4. Educational attainment of residents for 
1990, 2000, and 2010 5. Number of Christian adherents. The author chose these data points because they 
seem to most broadly define how one might describe a particular community: by size, racial makeup, income, 
and education. Although other data might also be included, these constituted starting point for the pilot 
study. 

The U.S. 2010 Census? defines population as the count of individuals in a given location during a 
specific time period. Percentage white is the percentage of the given population who self-identify as White. 
Median income was used over mean in order to have a better sense of what most people make in a particular 
county without any outliers that might obscure the average. Adherent data comes from Religious Congre- 
gations and Membership Study’. Adherents are defined as a complete count of the individuals affiliated 
with a congregation. 

Although it would be possible to run statistical tests on the data collected for the study, the author 
decided that only descriptive information was necessary. Statistical tests would show if there were significant 
differences between two data points given but, as will be discussed in further detail below, it is probable 
that the perception of change in these communities matters more to the challengers than the actual statis- 
tical significance of change among the different data. 


5 Map of Harry Potter Cases 


One of the most important aspects of this study was simply mapping the locations of Harry Potter challenge 
cases. Since these cases are instigated due of books’ theme of witchcraft, one might surmise that all of the 
cases took place in states that have large populations of evangelical and fundamentalist Christians. However, 
the cases covered all regions of the country, not just in the Bible Belt or conservative areas. 


1 All cases in South Carolina were removed because no city was given. 
? http: //www.census.gov/2010census/ 
3 http: //www.rems2010.org/compare.php 
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6 Findings 

Although there was not one defining characteristic that covered all of the counties that experienced Harry 
Potter challenges between1999-2007, there were several patterns that could be discerned through careful 
examination of the demographic data. These have been divided into three major categories each discussed 
in turn below: religious changes, educational changes, and population changes. The author would like to 
reiterate that these are preliminary findings. 


6.1 Religious Changes 


As noted above, “religious adherents” refers to the number of people reported as members by local Christian 
congregations. One of the most interesting findings in the communities that challenged Harry Potter was 
the volatile nature of these numbers. There is no defining characteristic among the various communities. 
Some experienced an increase (Fresno, CA 40-47%) while others experienced a large drop (Otero, NM 73.4% 
- 46.2% from 1990-2000. In areas where there was a decrease in overall population numbers, one sees a 
corresponding decrease in the number of adherents (Cattaragus NY 48% - 41.6% and Saginaw, MI 57.4% - 
49.5%). However one can also see in these numbers that membership in churches dropped from around half 
of the population to around 2/5ths and, in the second case, from a clear majority to just under half. 

It is important to recall that the reduction of congregants can result in the closing of churches in 
neighborhoods. That is, this decrease can have a physically manifestation in the community. On the other 
hand, in communities that saw an increase in adherents, it is possible that this increased particular indi- 
vidual’s confidence in flexing his or her political muscle over institutions such as the public school or library. 


6.2 Educational Changes 

Without exception, every community that experienced a Harry Potter challenge also experienced an increase 
in the number of people who had obtained bachelor’s degrees by the age of 25. One of the communities had 
a large increase in this number was Douglas, Colorado which saw a jump from 40.7% up to 51.9% from 
1990 to 2000. Ottawa, Michigan’s population of individual’s with BAs increased from 18.7% to 26% and 
Chester, Pennsylvania from 34.7% to 42.5% during the same time period. 
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It is difficult to know how this change in educational status among community members was per- 
ceived. It is possible that these college-educated individuals brought new ideas about reading and religious 
practices with the. Recall the Luebke quote above in which she mentions that some people moved to her 
community in order to be surrounded by conservative individuals. It is possible that these changes in 
educational attainment indicate why they did not find the communities they expected in these areas. 


6.3 Population Changes 


Perhaps the most interesting characteristic found in all of the communities that experienced a Harry Potter 
challenge was that they all also experienced a change in the racial makeup within these communities. Even 
when there was a slight decrease in the overall total population, there were concurrent drops in the per- 
centage of people identifying as white. For example, even though the population in Erie, New York dropped 
from 968,532 to 950,265 from 1990 to 2000, the percentage of the population that was white dropped from 
85.9% to 82.2%. In Cattaraugus, New York, the population dropped slightly from 84,234 to 83,955 while 
the percentage of that was white dropped from 96% to 94.9%. ON the other hand, counties with slight 
increases in population such as New Haven, Connecticut (804,219 to 824,008) experienced significant 
changes in the racial makeup in the county (85.5 to 79.4 percent white) from 1990 to 2000. 

As noted above, it is difficult to know how these changes in population composition would be 
perceived by communities. It is possible that these changes in the racial makeup would be perceived as a 
threat to the cohesiveness of the community by challengers. However, it is impossible to verify this without 


further investigation. 


7 Future Directions 


Similar to many pilot studies, this investigation into the communities that experienced Harry Potter chal- 
lenges exposed more questions than it answered. Although there were some tenuous similarities among the 
communities, only one characteristic (changes in racial demographics) was found in all of the counties. It 
should also be noted that this study did not include a control group and such changes might be found in 
communities that did not experience a challenge. 

In light of this, there are several different directions that this study might take. As noted above, 
this paper is a pilot study for a larger investigation into the geography of censorship. The larger project will 
have three parts. The first will consist of updating the Map of Censorship. It is hoped that the map will be 
dynamic and indicate the year of the challenge, location, reason, and initiator. The second part of the study 
will look more in-depth at the communities that experienced challenges during a defined time frame. Using 
data from the Census and the General Social Survey, the researchers will be able to more clearly define the 
geography of censorship using both statistical and survey data. Finally, the third part of project will consist 
of interviews and focus groups with librarians and challengers for a small number of geographically diverse 
challenge cases. 
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10 Appendix A — Counties with Harry Potter Challenge Cases 


State County 


AL Cullman 
AR Crawford 
CA Fresno 

CA Los Angeles 
CA Ventura 
CO Douglas 
CT Hartford 
CT New Haven 
FL Duval 

FL Santa Rosa 
GA Gwinnett 
IA Linn 

IL Will 

KY Russell 
MA Middlesex 
MI Ottawa 

MI Saginaw 
NM Otero 

NY Cattaraugus 
NY Erie 

OR Deschutes 
PA Chester 
TX Galveston 
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Abstract 

Twentieth century audio recordings and motion pictures are important sources, both for scholarly analysis 
and for public history. In some cases, important metadata has not reached the collecting institutions 
along with the materials, which are now in need of richer description. This paper describes a novel 
technique for determining the date and time on which a recording was made based on analysis of 
incidentally captured traces of small variations in the electric power supply at the time the recording was 
made. 
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1 Introduction 


The twentieth century produced two remarkable innovations in information technology: (1) the widespread 
creation of digital content, and (2) the ability to record sound and motion pictures in very substantial 
quantities. In our time, these technologies have since converged, but they initially grew up separately. As 
a result, many of the twentieth century recordings that now form an important part of our cultural heritage 
lack important metadata (Bamberger and Brylawski, 2010). In this paper, we focus on recovering one 
specific type of metadata: the date and time at which a recording was made. We do this by leveraging a 
third great innovation of the twentieth century: electrical power. 

Audio engineers have long known that without proper isolation and filtering, electrical noise can 
cause an annoying low frequency hum in a recording that is audible to the human ear. The reason for this 
is that the electrical network now almost universally uses alternating current with a frequency of 60 Hz (in 
most of North America) or 50 Hz (in much of the rest of the world). This signal typically enters the 
recording device by induction, a process by which an electrical variation in one component propagates to 
another component using electromagnetic coupling. It is difficult to eliminate this coupling completely, so 
audio engineers typically seek to design equipment in a way that will reduce the resulting hum to an 
inaudible level. 

This undesirable hum actually turns out to be useful, however. In the last decade of the twentieth 
century, we learned that this hum varies in detectable ways, and it does so in a pattern that rarely repeats 
for a sufficiently long recording (Grigoras, 2007). The cause of that variation is the somewhat random 
process by which generating capacity and electrical loads are added to and removed from the electrical 
network as demand for electricity changes over time. These activities result in small but detectable 
fluctuations in the Electric Network Frequency (ENF). Because these variations propagate through the 
electrical network extremely quickly, the variations follow very similar patterns everywhere in the network. 
We can use this phenomenon in two ways. First, we can tell when a recording was made by comparing the 
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ENF signal in the recording with intentionally recorded ENF traces from the world’s major electrical 
networks, which have been collected for this purpose for the past decade or so. Second, we can detect 
whether a recording has been edited because insertions and deletions will result in sudden changes in the 
ENF that do not match the reference ENF trace to which the contiguous segments align. 

For earlier times, where we lack ENF references, we can at least determine whether two recordings 
were made at the same time by comparing their ENF traces. This will only work if they were recorded in 
the same electrical network, but these networks typically have a very large geographic extent. For example, 
most of North America is covered by one of four electrical networks, one west of the Rockies, one for Texas, 
one for Quebec, and one for the remainder of North America east of the Rockies. That idea leads to a novel 
approach for recovering reference ENF traces for earlier times. If we could find a set of recordings that had 
been made at known times in the past from which we can recover an ENF trace, then we can use those 
recordings to establish a set of reference traces. 

Interestingly, ENF traces can also be detected in some motion picture recordings, using techniques 
that are based on detecting the imperceptible (to the eye) flicker produced by indoor lighting (Garg, Varna, 
and Wu, 2011; Garg, Varna, Hajj-Ahmad, and Wu, 2013). We therefore need not limit our attention solely 
to audio recordings. We do, however, need to work with relatively long recordings, since the random 
variations in the ENF do (randomly) repeat over short times. It has been shown that about 10 minutes 
worth of recording of average quality suffices to obtain a good match (Huijbregtse and Geradts, 2009), and 
with an hour of recording, errors in determining whether two recordings were made at the same time become 
very low. 

The remainder of this paper is organized as follows. In section 2, we identify some sources of 
historically significant audio, some of which were recorded at known times and others of which were not. 
Section 3 then presents some initial experiment results with a few of these sources. Section 4 concludes the 


paper with some observations about the implications of this work for archival practice. 


2 Twentieth Century Recordings 


In attempting to reverse engineer reference ENF traces using surviving audiovisual materials, we must 
contend with a patchwork landscape of recordings that cannot match the degree of coverage provided by 
more recent forensic databases. This problem is compounded the further back in time we go; the rich 
repositories of sound available to us from the 1970s, for example, contrast with the comparative scarcity of 
sound from just a decade earlier, in the 1960s. In order to compensate for these disadvantages, we have 
started to identify promising sources of audiovisual materials that are of sufficient size and scope to begin 
to lay down a twentieth-century ENF reference resource. To be useful as a reference, these materials must 
be accompanied by trusted metadata. Good candidates include live radio and television broadcasts--such as 
sports, news, and entertainment (think Johnny Carson or Saturday Night Live)—that have either been 
preserved within the organization that originally produced them or acquired by a collecting institution, such 
as the Library of Congress. In the case of historic radio broadcasts, it is often private collectors rather than 
large institutions that have endeavored to save them. NASA space missions and college radio broadcasts 
are also possible sources for reference ENF traces. 

To illustrate some of the challenges and opportunities inherent in this work, consider the Vanderbilt 
Television News Archive, founded in 1968, which contains over 40,000 hours of news footage, including the 
nightly news broadcasts of ABC, CBS, and NBC from 5 August 1968 to the present (1968-2013). Spanning 
nearly five decades, the collection is thus a tremendously rich source of audio and video, with trusted date 
and time stamps that exist on a sufficiently large scale to be of value as a reference. These strengths 
notwithstanding, this archive also has some potential drawbacks. One relates to provenance: although the 
weekday shows were usually recorded from local network affiliates at the time they aired in Nashville, 


Tennessee, there are some cases in which these were supplemented by recordings captured in other locations, 
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not all of which are on the same electrical network (Lynch 2013). Since this information is not necessarily 
available in the archive’s online catalog, it adds an element of uncertainty that must be factored into the 
ENF analysis. Other collections offer complementary coverage, including a large collection of analog tape 
of National Public Radio (NPR) programs dating back to the 1970s (Ottalini, 2013). Because the NPR 
audio has not yet been fully digitized, it may be possible to instrument the digitization process. As explained 
below, that ability can be useful in some cases. 


3 Initial Experiments 


Analog recordings must be digitized before we can extract ENF traces. As a result, ENF traces may be 
present both from the original analog recording process and from the later digitization process. The 
spectrogram of a digitized audio recording from President Kennedy’s White House conversations (Miller 
Center) is shown in Figure 1. The original recording was made in 1962 using an analog tape recorder, and 
digitized later. We observe two different ENF signals near 240 Hz,! one of which disappears well before the 
end of the digitized audio file. Listening to the audio, we note that the original recording was turned off at 
this time, leaving no recording on the remainder of the tape. We can therefore reasonably conjecture that 
the 239 Hz signal is the ENF trace due to the original recording, and the 240 Hz signal is the ENF trace 
due to the digitization process. The ENF trace from the original recording would be expected to deviate 
from its nominal value (240 Hz) if the tape were playing somewhat more slowly than it should have during 
digitization. Although undesirable for the fidelity of the audio content, this small error works to our benefit 
for ENF analysis. Indeed, since the error can be corrected during digital replay, we might actually prefer to 
use a slightly off-speed analog replay device when seeing to capture ENF traces. 
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Figure 1: Spectrogram of a digitized Kennedy recording. 


Of course, we are not always so lucky. To explore the worst possible case, completely overlapping ENF 
traces, we first made a recording in an acoustic chamber, and then we later played it back a speaker and 
re-record it in the same chamber. From the spectrogram of the recaptured audio shown in the bottom half 
of Figure 2, we observe that the two resulting ENF traces overlap completely, creating a single rather 


1 ENF traces in audio recordings made in the United States generally have nominal value of 60 Hz, and higher harmonics with the 
same patterns are found at small integer multiple of that value. Which harmonic is most useful as an ENF trace depends on both the 


characteristics of the recording equipment and the sound being recorded. In this case, we are seeing the fourth harmonic, near 240 Hz. 
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confused pattern.*? At the top of the figure (the time after the original audio was switched off) the signal 
becomes clean again, as only a single ENF trace is now being captured. We have elsewhere shown that if a 
reference ENF trace is available from the time of the recording, we can essentially subtract that reference 
from the combined ENF traces, recovering a usable ENF trace from the original recording (Su, Garg, Hajj- 
Ahmad, and Wu, 2013). 


Time (seconds) 
ae SS es 
YN FF o œo o N A D 
ð o d ó o 85 83 © 
S 6 6 6.6 6 6 6 
PC Eee 
Ë 5 ü Š a 3 z 


116 117 118 119 120 121 122 123 124 
Frequency (Hz) 


Figure 2: Spectrogram of a re-recorded audio signal with overlapping ENF traces. 


Figure 3 shows an expanded and cleaned view of the ENF traces from two historical recordings that we 
know were recorded at the same time and in the same electrical network (Houston Audio Control Room). 
One ENF trace is from the landing of the Apollo 11 spacecraft on the Moon, as released to the broadcast 
media by the Public Affairs Officer (PAO) in Houston. The other ENF trace is from the intercom loop used 
by the Flight Director in the Mission Operations Control Room to coordinate the activities of the flight 
controllers during that landing. Because the voices of the astronauts appear in both recordings, we can be 
sure of the accuracy of our time alignment, which we performed manually. ENF traces from different sources 
might be stronger or weaker; for plotting purposes we have normalized the amplitude scale from each source 
to facilitate visual comparison. Despite some minor deviations (which result from recorded sounds that at 
some times happen to be at a frequency near that of the ENF trace), the ENF signals extracted from the 
two recordings present very similar variation patterns, sufficient to yield a high correlation coefficient 
(around 0.7) and thus a high confidence automatic match. This example also illustrates that ENF traces 
might also prove useful when seeking to align recordings made in different places at the same time, as might 
be the case, for example, in time-synchronized reconstruction of complex events. 


2 The effect is most easily seen when this figure is viewed in color. 
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Figure 3: ENF traces extracted from two concurrent Apollo 11 audio recordings. 


The cumulative coverage of the early space missions in the 1960s and 1970s amounts to about a year of 
audio in total, with continuous coverage of between a few days and a few months for any one mission, 
mostly recorded in the Texas electrical network. For more nearly continuous coverage, broadcasting provides 
one possible source. We therefore have also experimented with evening news programs from two different 
networks that had been broadcast at the same time. For this purpose, we chose the 5:30 PM news programs 
from CBS and NBC that were broadcast on August 9, 1974, the day Richard Nixon resigned and Gerald 
Ford became President of the United States. 

As Figure 4 shows, each recording has significant energy around 60 Hz and its harmonics. No match 
between the ENF traces could be found, however. Figure 5 clearly illustrates the reason: the live broadcast 
from news announcers accounts for only a small part of each program, and those parts did not often occur 
at the same times. Other parts of the programs include field reports (which might have been pre-recorded) 
and commercials (which were usually pre-recorded). With some effort we might be able to reconstruct some 
longer alignments from all of these short segments, but we now believe that our time would better be spent 
with alternative sources for long-period live broadcasting, including for example sports events and CSPAN. 
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Figure 4. Spectrograms for the August 9, 1974 CBS and NBC evening news recordings. 
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4 Conclusions and Future Work 


We have shown that recovering ENF traces from mid-twentieth century audio recordings is indeed both 
possible and practical. Moreover, our survey of twentieth century recordings indicates that substantial 
amounts of content that was recorded at known times are available for the second half of the century, with 
far spottier coverage before that. 

Of course, much remains to be done. For one thing, research on recovery of ENF traces from motion 
pictures has not yet been tried with motion picture film or videotape, so the applicability of video-based 
techniques to the era before digital video remains to be determined. For another, a more complete set of 
reference ENF traces might be constructed by chaining together overlapping recordings, only the first (or 
last) of which has a known time. However, we do not yet know how common sufficiently long overlapping 
segments will be. 

One important implication of our work for archival practice is that what cannot be heard or seen 
may nonetheless be important. Of course, this broad principle is well appreciated by archivists, and it is 
the driving force behind archival audio standards such as Broadcast Wave Format. It is surely useful to 
have an example such as this one at hand, however. A second implication for archival practice is that the 
temporal resolution of existing metadata is a limiting factor in our ability to project that timing data to 
new content. This may argue for reconsidering the temporal resolution of metadata for certain types of 
sources (e.g., live broadcasting) that can be particularly useful as a basis for establishing reference ENF 
traces. Of course, a third implication of ENF traces for archival practice is that they can provide an 
additional basis for assessing authenticity of recordings whose authenticity might otherwise be open to 
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Figure 4: Approximate timeline showing news show breakdown by clip type. 
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Abstract 

Underlying Jella Lepman's founding of the International Youth Library In Munich, Germany in 1949, is 
the firm belief that reading high quality, culturally specific literature can lead young readers to gain 
cultural competence. However, there is little systematic research upon which this assertion can be based. 
This research study, in the preliminary and early results stage, is designed to help fill this evidence gap. 
The content, culture, and technology in this study that have the potential to break down global walls 
are explained. Focus was on a convenience sample of 9 and 10 year old readers at 12 urban schools who 
participate in a Global Reading Challenge at their school, regional, and district level based on twelve 
culturally specific books. The researchers employed a pre-post test mixed methods design with qualitative 
methods (interview) based on quantitative methods (survey) and used three models of cultural 
competence to analyze the data. 
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1 Introduction 


In 1949 following the drastically inhumane events of World War II, Jella Lepman, an internationally 
recognized journalist, established the International Youth Library in Munich, Germany. She ardently 
believed that high quality literature for youth provides an important vehicle for promoting peace and 
understanding and that it could play an important role in preventing another such conflict (Lepman 2002). 
Today the International Youth Library is the largest library for international children's and youth literature 
in the world, with 32 full-time staff and facilities for researchers, teachers, and youth to examine and study 
the best literature from many countries. Lepman also spearheaded the effort to establish the International 
Book Board for Young People (IBBY), an organization founded in 1953 committed to furthering the goals 
of understanding across numerous countries of the world. Many countries now have national or regional 
chapters. One of IBBY’s main proclamations is “the right of the child to a general education and to direct 
access to information (IBBY 2013).” 


1.1 Problem statement 
Inspired by Lepman’s work, scholars and professionals have expanded upon it, leading to the frequent 
statement that such high quality, culturally specific literature can serve as mirrors for children to see 
themselves, and windows to see and understand youth from cultures that differ from their own as well as 
doors that encourage them to enter and experience those other cultures for ultimate understanding. Those 
who make such statements are as convinced that they are true as Lepman herself believed, but when we, 
as scholars, are challenged to back these statements up with empirical evidence, we find that there is little 
or no systematic research upon which these assertions can be based. We just ‘know’ it is true. 

This research study, in the preliminary and early results stage, is designed to help fill this evidence 
gap with systematically gathered concrete evidence about the results of reading high quality, culturally 
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specific globally-oriented literature for children 9 and 10 years of age. It focuses on a convenience sample of 
children from the 2500 in 45 schools who are participating in a Global Reading Challenge. It also seeks 
evidence of the impact of a shared reading experience for digital youth who are part of a world dominated 
by interactive social media. 


1.2 Research Questions 


RQ1 What, if any effect, does the close reading of high quality, culturally specific, globally-oriented youth 
literature across a diversity of experiences have on children’s information about the cultures about which 
they read? 

RQ2 What, if any effect, does the close reading of high quality, culturally specific, globally-oriented youth 
literature across a diversity of experiences have on children’s gaining active cultural competence in relation 
to the cultures about which they have read? 

RQ3 What, if any effect, does the close reading of high quality, culturally specific, globally-oriented youth 
literature across a diversity of experiences as part of an interactive team experience have on children’s 
information about the cultures about which they read? 

RQ4 What, if any effect, does the close reading of high quality, culturally specific, globally-oriented youth 
literature across a diversity of experiences as part of an interactive team experience have on children’s 
gaining active cultural competence in relation to the cultures about which they have read? 


1.3 Context Setting and Background 


The Global Reading Challenge (GRC) originated in the Kalamazoo, Michigan Public Library approximately 
two decades ago. Mary Palmer, a librarian, brought the program to the Seattle, Washington Public Library 
18 years ago and has been the coordinator of it since. A few years into the program, the Vancouver, British 
Columbia Public Library and Public Schools joined the Challenge so at various times 3 public libraries and 
3 public school systems have been involved nationally and internationally. The current research is situated 
in the Seattle Public Library and Public Schools. Participation in the teams is voluntary informal learning, 
administered by librarians and to some extent teachers. The challenges start with teams competing by 
responding to questions at the school level, then a regional level, and finally a citywide or in some cases 
international final level. 
Two of the five Global Reading Challenge (GRC) goals related to this research are to 


e Increase teamwork and cooperative thinking skills. 
e Share quality children’s literature that represents a diversity of experiences at a variety of reading 
levels. (Seattle Public Library Foundation, 2013) 


Each year the Global Reading Challenge program coordinator selects ten high quality books that are 
designed to meet the goals of the program upon which the challenge will be based. These high quality books 
contain culturally specific, globally-oriented youth literature across a diversity of experiences. Books are 
provided for participating teams, who share among their members. 

For the first time in 2012 — 13 an outside evaluator was hired to conduct a study about the extent 
to which the program was meetings its goals. The results were overwhelmingly positive (Seattle Public 
Library Foundation and Moreno 2013). Respondents agreed that participants’ teamwork skills were 
significantly improved. However, impact of the one goal that was not addressed in this evaluation is the 
second one listed above. The children obviously did share this type of literature but no assessment was 
made of its impact and no attempt was made to answer the related research questions posed by this study. 


1.4 Theoretical and Conceptual Frameworks 


The researchers employed three models or theoretical frameworks to develop the data collection instruments 
and later to analyze the results of the data collected and to reach some answers to RQ1 and RQ2. The first 
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of these is Critical Multicultural Theory, the application of which to children’s literature is most thoroughly 
developed by Botelho and Rudman (2009). According to this theory, “children’s literature is read against 
its sociopolitical context. Readers ascertain what cultural themes are imbedded in the work (5).” Children 
must read the ten books selected quite closely exposing them to sociopolitical context where it exists. The 
research will reveal whether the students understand the sociopolitical context. 

A second model the researchers employed is one that Deborah Abilock developed to measure where 
on a continuum her students are in achieving cultural competency. Her visual representation introduces the 
notion that students, teachers, and librarians can work to become more and more culturally competent 
along a scale that runs in six stages from Destructiveness to Cultural Proficiency. 


A Continuum of Cultural Proficie 


re a 


Destructiveness Incapacity Blindness Precompetence Compet 


Reprinted with permission of Deborah Ablilock from Deborah Abilock, “Educating Students for Cross- 
Cultural Proficiency.” Knowledge Quest 35 (Nov/Dec): 11. 


Figure 1: A Continuum of Cultural Proficiency 


The third model the researchers employed is a more impact-oriented model addressing changes in curricular 
content, in this case book content, that teachers can make to help students move toward becoming culturally 
competent. This model, by James Banks (2009), delineates four Levels of Integration of Ethnic Content to 
guide teachers who are striving to lead their students to cultural competency. 
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Level 4 
The Social Action Approach 


Students make decisions on important social issues and 
take actions to solve them. 


Level 3 
The Transformative Approach 


The structure of the curriculum is changed to enable 
students to view concepts, issues, events, and themes 
from the perspectives of diverse ethnic and cultural 
groups. 


Level 2 
The Additive Approach 


Content, concepts, themes, and perspective are added to 
the curriculum without changing its structure. 


Level 1 
The Contributions Approach 


Focuses on heroes, holidays, and discrete cultural 
elements. 


Source: Figure 1.6 Levels of Integration of Ethnic Content 

Copyright © 2004 by James A. Banks 

Reprinted with the permission of James A. Banks from James A. Banks, Teaching Strategies for Ethnic 
Studies. Boston: Pearson Allyn and Bacon, 2009, page 19. 


Figure 3-1 


Figure 2: Levels of Integration of Ethnic Content 


The lowest level focuses on food, holidays, heroes and other discrete cultural elements while the final level 
presents the outcome of becoming culturally competent, i.e., students are involved in grappling with real 
social issues and in taking action to stop injustices. The final notes, if accepted, will give statistics for book 
content falling at each level. 

Construction of instruments and analyses for RQ3 and RQ4 were based on models of connected 
learning with digital youth (What is Connected Learning? 2013), brought to the attention of many by the 
MacArthur and Mozzila Foundations digital media and learning cooperative initiatives. The idea of shared 
competence through the principles of equity and social connection upon which connected learning is based 
harkens back to Lepman’s philosophy of children’s literature as bridges to understanding. 


1.5 Methodology 


The researchers, including three faculty members, of whom one is the project statistician, are using an 
explanatory sequential mixed-methods design (Creswell, 2014) with a quantitative phase that informs a 
following-qualitative phase. The population consists of a sub-sample of the 4" and 5" grade students in 12 
of the 45 schools participating in Global Reading Challenge Teams in the 2013 — 14 school year in Seattle. 
The schools were chosen to represent the variety of socioeconomic levels and reading scores achievements 
represented throughout the city. It is noteworthy that present among the ten finalists in the previous year, 
were both a school with the highest and another school with the lowest proportion of children on free and 
reduced lunch, a U.S. measure of poverty commonly used in schools. 

Institutional Review Board approval was obtained from the University of Washington as well as 
permission from the Seattle Public Library and Seattle Public Schools. Parents were asked for their 
permission for their children to participate in the study. 
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The initial instrument is a survey that the researchers developed and pretested with the assistance 
of the Social Development Research Group at the University of Washington. It consists of a web-based 
test with first person statements to which students respond on a 5 point Likert scale. The questions focus 
on the readers’ attitudes toward the cultures about which they read and their interest in active involvement 
with these cultures. (RQ1 and RQ2). For example, “I have read a book that makes me want to experience 
a culture other than my own.” 

This is a pre-post-test design to ascertain whether there is any significant change among the 
students collectively after they read the books. The results of this quantitative survey informs the interview 
questions, addressing all four research questions, that will be posed to a selected group of young readers, a 
purposeful sample consisting of approximately 20 children representing the spectrum of readers across socio- 
economic levels. These questions will probe more deeply into the information the young readers gained, 
their attitudes and possible actions as a result of reading, as well as into their connected learning experiences 
as team members and as part of a larger competition. 


2 Culture 


The term culture can have many different meanings. For this research, we define culture as “the behaviors 
and beliefs, and characteristic of a particular social, ethnic, or age group” (dictionary.com 2013). Specifically 
we are referring to the culture of contemporary digital culture young people from various ethnicities. 
Although it is unrealistic to propose that reading up to ten books about varying cultures will prompt 
a young person to become a viable part of any one of the second cultures portrayed, it is instructive to 
think of a long term goal of how that might look should the reader become interested enough to pursue 
more information and even engage in real life experience with the culture, i.e., what is the an ‘ideal’ approach 
to cultural competency or proficiency in a second culture? According to LaFromboise et al, there are many 
different aspects of cultural competency. We are focusing upon two of these in this study on “display 
sensitivity to the affective processes of the culture” and “perform socially sanctioned behavior” (396). 
Furthermore, Lafromboise et al. propose five methods that individuals can achieve this cultural competency: 


e through assimilation (becoming part of a second culture, one of which is considered dominant or 
more desirable), 

e acculturation (knowing about but not participating in a second culture, one of which is considered 
dominant), 

e alternation (knowing about two different cultures, neither of which is considered dominant, and 
altering behavior to fit a particular social context within either culture) 

e multicultural model (maintaining distinct identities while individuals from one culture work with 
those of other cultures to serve common. . . needs, neither considered dominant) 

e fusion (melting pot where former cultures blend indistinguishably). 


These authors propose that the alternation model in which a person becomes comfortable in one’s own 
culture as well as a second culture, is ideal. For the youth involved in this study, thinking in terms of easily 
moving back and forth between two cultures, not assuming one or the other is dominant, is useful in 
questioning whether the reading experience will begin to lead to an initial level of cultural competency in 
two or more distinctly different cultures. This is an issue that has never, to the best knowledge of the 
current researchers, been investigated, even on a more or less superficial or beginning basis. 


3 Technology 


The third theme addressed by the conference, computers or in the broader sense technology, relates to this 
study in two ways. It is a vehicle for the connected learning that possibly reinforces the gaining of cultural 
competency. One of the most basic technologies, the printed book, is the principle instrument for connections 
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among these young readers; the books provide the basis for study and reflection as well as information. 
Other electronic or digital technologies have figured into the Global Reading Challenge over the years, 
including videoconferencing that connects the British Columbia and Seattle participants when both 
participate. Another type of facilitating technology is audiobooks, provided for youth whose print reading 
abilities or second language skills make reading print in English difficult. 


4 Preliminary or Early Results 


Despite the fact that the previous evaluation of this project did not consider cultural impact or impact of 
diversity of experience, some youth volunteered this information anyway. One student wrote “Thank you 
Ms. Sherman for showing me that I can change racism. And stand up to the people that are racist.” Many 
others mentioned being empowered to stand up to bullies. One of the classes involved in last year’s study, 
read a book about the shortage of water in Sudan and became involved in the Sudan Water Project. The 
current survey instrument has been constructed, pretested, and administered to the students in the 12 
participating Seattle Schools and was administered before the current year’s reading list was distributed to 
students. As might be expected, young readers knew little about cultures other than their own at the 
beginning and had no plans for action. The results of this first administration of the instruments have been 
coded and stored for comparison with the post Global Reading Challenge survey and interview to be 
administered shortly before the conference in the spring. These results will be reported at the iConference. 


5 Significance of Study 


The results of this study will provide the first known research-based evidence about the impact on cultural 
competence of young people’s close reading of high quality, culturally specific, globally-oriented youth 
literature across a diversity of experiences. Jella Lepman and many others have held firm to the belief that 
this type of reading can ultimately lead to cultural competency and world peace. This first step determines 
what information at least one group of young readers gain about other cultures through reading in a 
connected learning situation and how their attitudes and actions are affected by such reading. It fills a gap 
in the research literature as well as aligning closely with the themes of the 2014 iConference. (2444 words) 


6 References 


Botelho, M.J. and Rudman,M.K. (2009). Critical Multicultural Analysis of Children’s Literature: Mirrors, 
Windows, and Doors. Kindle ebook edition. New York: Routledge, unpaged. 

Dresang, E. (2013). Gaining cultural competence through youth literature. In J.C. Naidoo and S. Dahlen 
(Ed.), Diversity in youth literature: Opening doors through reading.. American Library Association. 

LaFromboise, T., Coleman, H. L., & Gerton, J. (1993). Psychological impact of biculturalism: Evidence and 
theory. Psychological Bulletin 114:3 (395-412) 

Lepman, Jella (2002, 1969) A Bridge of Children's Books: The Inspiring Autobiography of a Remarkable 
Woman. Rev. ed. O’Brien Press. 

Seattle Public Library Foundation (2013). The 2012-13 Global Reading Challenge Final Report 

What is Connected Learning?: Connected Learning Principles. Accessed August 17 2013 at 
http: / /connectedlearning.tv/connected-learning-principles 


7 Table of Figures 


Figure 1: A Continuum of Cultural Proficiency .sissssssssssvsssssssvsstssisvssts sst eee e cece eee e eee NESS PSSS e teen eeeeeeeeeeeeeeeees 645 
Figure 2: Levels of Integration of Ethnic Content..........cccccceeeeeeecccceeeeeaaeeeeeccceeeeaaaaeeeeeeceeeeaaaaaneeeeess 646 


648 


Using Visual Analytics to Enhance Data Exploration and Knowledge Discovery in 
Financial Systemic Risk Analysis: The Multivariate Density Estimator 


Victoria L. Lemieux!’, Benjamin W.K. Shieh?, David Lau’, Sun Hwan Jun’, Thomas 


Dang’, Johnathan Chu? and Geran Tam? 
iSchool, The University of British Columbia 
? Media and Graphics Interdisciplinary Centre, The University of British Columbia 


Abstract 

Analyzing and managing the risks in financial systems is necessary to maintain healthy global financial 
systems and economic wellbeing. However, the complexity of the financial system and the heterogeneity 
and volume of data sources needed for financial systemic risk analysis are currently overwhelming. Visual 
Analytics tools can be used to provide macroprudential supervisors with greater visibility into the health 
of financial systems by augmenting their information processing capabilities. To this end, we present a 
novel prototype design for a visual analytics tool that implements the multivariate density estimator of 
financial systemic risk, explaining how it addresses macroprudential supervisors’ need for enhanced data 
exploration and knowledge discovery capabilities. 
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1 Introduction 


The financial crisis of 2007-2009 led to many bank failures and a global credit crunch that drew attention 
to the need for enhanced financial systemic risk analysis capabilities (Lemieux, 2013). A Financial Stability 
Board and International Monetary Fund report on “The Financial Crisis and Information Gaps” noted that, 
“Indeed, the recent crisis has reaffirmed an old lesson—good data and good analysis are the lifeblood of 
effective surveillance and policy responses at both the national and international levels” (FSB/IMF, 2009). 
Given this, it is worthwhile exploring how new approaches to the analysis of large and complex data sources 
might be used to enhance financial systemic risk analysis capabilities. One approach is to apply Visual 
Analytics (VA), defined as the science of analytical reasoning facilitated by interactive visual interfaces 
(Thomas and Cook, 2005). VA is particularly suited to addressing information processing challenges as it 
combines machine intelligence with the visual and cognitive intelligence of human analysts through the use 
of interactive visual interfaces. 

This paper presents a design study of a VA system to support financial systemic risk analysis. The 
presentation of the study follows Munzner’s Nested Model for Visualization Design and Validation 
(Munzner, 2009). The model divides visualization design into four levels: 1) characterize the problems of 
the real-world users; 2) abstract into operations on data types; 3) design visual encoding and interaction 
techniques; and 4) create algorithms to execute techniques efficiently. As our design study makes a 
contribution in only the first three areas, we will focus the discussion on these areas of the Nested Model. 
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2 Domain Problem Characterization 


A VA tool designer must learn about the tasks and data of users in the target domain (Munzner, 2009). To 
do this, we reviewed literature in the field of financial systemic risk analysis to understand how it is 
characterized. To complement our literature review, we held an interdisciplinary workshop where financial 
systemic risk experts met with visual analytics experts to characterize the domain problems (Lemieux, 
2012a). The workshop was supplemented by an interdisciplinary panel discussion on the application of VA 
to financial systemic risk analysis and further discussions of the topic at meetings of experts in the field of 
financial systemic risk analysis (Lemieux, 2012b). 

Having conducted research to characterize the domain problems and to learn about the tasks and 
data of users in the target domain, we identified several key domain problems. Domain experts placed 
emphasis on one of these in particular, which was the need to understand counterparty networks and 
interconnectedness of financial institutions. There are many different measures of financial 
interconnectedness; however, we chose to rely upon the time-varying multivariate, distress dependency 
(MDE) approach in the design of our tool (Segoviano and Goodhart, 2009). The MDE approach consists of 
a set of measures to analyse stability from three complementary perspectives by allowing: 1) the 
quantification of common distress in the banks of a system, 2) distress between specific banks, and 3) 
distress in the system associated with a specific bank i.e. a cascade effect. We acknowledge that the MDE 
approach is but one of many measures of financial interconnectedness, and a robust approach to financial 
systemic risk analysis will benefit from tools that support a variety of approaches. 


3 Abstract into Operations on Data Types 


Following Munzner’s (2009) Nested Model, in the abstraction stage we mapped problems and data from the 
vocabulary of the domain of financial systemic risk into the more abstract and generic description that is 
in the vocabulary of information visualization and visual analytics. The output is 1) a description of 
operations (i.e., generic tasks) and 2) a description of data types, which are the input required for making 
visual encoding decisions at the next level. 

To achieve a description of operations, we first extrapolated high-level domain specific tasks from 
the literature on financial network analysis. We then linked the higher analytic tasks to several relevant 
formative evaluation frameworks in the field of information visualization and visual analytics. For example, 
we used Amar and Stasko’s (2004) knowledge task-based taxonomy aimed at addressing complex decision- 
making, especially under uncertainty. We also applied Yi et al. (2007), who have proposed a taxonomy of 
interaction techniques, which they define as “the features that provide users with the ability to directly or 
indirectly manipulate and interpret representations” (p.2). Finally, Lee et al. (2006) propose a task 
taxonomy specifically for graph exploration, which is suited for use in interactions with networks. 

Raw data inputs for computation of the MDE measure are: (1) a list of banks denoted by their 
ticker tape symbols and (2) a table of Credit Default Swap (CDS) values for each bank, which we obtained 
from Bloomberg. CDS products are used because they act as signal to the market about the viability of the 
underlying financial institution (Credit Default Swaps, 2013). The raw data were held in two .csv files. 
Computation of the raw data, using the algorithms from Segoviano and Goodhart, produced derived data 
consisting of: (1) values for a Joint Probability of Distress (JPoD) and (2) values for a Distress Dependence 
Matrix (DDM) (Segoviano and Goodhart, 2009). The derived data were output to two .csv files. We 
supplemented this data with a table of market capitalization values (once again obtained from Bloomberg) 
per bank, also in .csv format, in order to be able to represent the size of each bank relative to the market 
as a whole. 
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4 Design Visual Encoding and Interaction Techniques 


Once we had identified the basic operations and transformed the raw data into the derived data, it was 
possible to determine the type of visualization best suited to representing the data and likely to meet the 
task requirements. We had a number of options to choose from. Networks of financial relationships, like 
other types of networks, are typically abstracted as graphs. Graphs may be represented as matrices, as 
node-link graphs, or as trees. In much of the literature that we reviewed for this study, financial 
interconnections are represented as node-link graphs. Many of these graphs are over plotted and difficult to 
interpret. 

We chose to visually encode our derived data using a Treemap (Johnson and Schneiderman, 1991). 
Technically, a tree is a connected, unweighted acyclic graph. There are two types of trees: space filling and 
non-space filling. A Treemap is of the space-filling variety and therefore uses screen real estate very 
efficiently compared to non-space-filling visualizations. For this reason, we judged that a Treemap would 
avoid the over-plotting and occlusions that make node-link graphs representing large networks difficult to 
read, though we acknowledge that Treemap also has limitations in the representation of large datasets. 
That is, banks that are small in size relative to the market as a whole may not be clearly visible in displays 
of large networks. Recalling our high-level domain tasks, we also saw the Treemap as an effective way to 
represent the topology of the network at different levels of granularity as it has been used to do in other 
financial application areas (Smartmoney™, 2013). 

In our system, a quick glance at the Treemap visualization provides financial systemic risk analysts 
with an instant overview of the status of the financial system and the number of banks that may be in 
danger of default. The first Treemap visualization, the PoD view, shows the joint probability of distress 
among all banks in the system. The Treemap visualization maps the size of the rectangle to the market 
capitalization of each bank, so that an analyst can quickly infer the impact of a bank’s failure on the market 
as a whole i.e. whether it might be “too big to fail”. Colour is used to represent whether the bank’s 
probability of default value is above (red) or below (blue) a user-defined threshold. The analyst is able to 
interact with the Treemap visualization by changing the default threshold value in a separate input box. 
This provides the analyst with the ability to conduct “what if” scenario analysis to instantly see the impact 
on the financial system’s stability as the threshold value is changed. The Treemap visualization is reinforced 
by a separate Bullet graph visualization that shows the relative probability of distress values for each bank 
using colour (black) and bar length. The user defined default threshold is represented by a background 
display of colour (blue) and bar length. Users are able to change the default threshold in the Bullet graph 
by pointing and clicking on the blue slider bar to move the threshold value up or down. Linking techniques 
automatically update the Treemap visualization to reflect how changes in the threshold value affect stability 
of the financial system. Having understood the status of the financial system as a whole, an analyst might 
wish to understand the affect that a default of a particular bank would have on other banks in the system. 
The second visualization in our multiview system — the Distress Dependency Matrix (DDM) - supports this 
analytic task. By switching to the DDM view, the analyst is able to see the probability of default of all 
connected banks in the event that a particular bank defaults. In this view, the bank of interest is greyed 
out, while all other banks are either blue (safe) or red (at risk of default). The bullet graph view is retained 
in this view to indicate the distress level of each bank against the adjustable user-defined threshold to 


support “what if” scenario analysis. 
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Figure 1: A screenshot of the MDEV tool showing the PoD view for five banks 


5 Validation and Evaluation 


In this section we discuss what has been done to date in regard to validation and evaluation of our system. 
Following Isenberg et al. (2008), we used a grounded evaluation approach to validate our characterization 
of the domain. We validated that visual analytics was a legitimate approach to solving the information 
processing problems in the domain of interest through conducting the workshop and panel discussions on 
the application of visual analytics for financial systemic risk analysis. After we developed our first prototype, 
we validated this assumption again by demonstrating the system to two groups of experts in the field. Both 
groups confirmed that the system was useful and would support the ability to more readily see and 
understand financial interconnections in relation to financial distress. We acknowledge; however, that 
ethnographic and cognitive work analysis would deepen our understanding of the domain problems and 
high-level analytic tasks. 

At the visual encoding and interaction design level, we undertook a formative evaluation with one 
expert group at which we received feedback that the way we had chosen to present the multiple views in 
our system was confusing to users. Initially, we had users click on a particular bank in the PoD Treemap 
visualization to semantically zoom into the DDM visualization. Users found this confusing because they 
expected details about the particular bank, rather than about the effect of the bank on other banks. This 
feedback led to our changing how we presented the PoD and DDM views, and to reserve semantic zooming 
from one level of the Treemap visualization to another for use in providing greater detail on that bank’s 
balance sheet structure. Users also expressed a desire to see a view of the interconnections in the more 
traditional node-link graph representation, which we will incorporate into a later release of our system. 


6 Conclusion 


Addressing the information processing challenges that contributed to the global financial crisis remains a 
significant and unresolved challenge. Visual analytics, which combines the strength of machine information 
processing with the best of human information processing through the use of interactive visual interfaces 
offers a promising approach to addressing these challenges. Our system aims to make a novel contribution 
to the application of visual analytics in the domain of financial systemic risk analysis. It is, however, still a 
work in progress. 
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Abstract 

The Boston Marathon bombing story unfolded on every possible carrier of information available in the 
spring of 2013, including Twitter. As information spread, it was filled with rumors (unsubstantiated 
information), and many of these rumors contained misinformation. Earlier studies have suggested that 
crowdsourced information flows can correct misinformation, and our research investigates this 
proposition. This exploratory research examines three rumors, later demonstrated to be false, that 
circulated on Twitter in the aftermath of the bombings. Our findings suggest that corrections to the 
misinformation emerge but are muted compared with the propagation of the misinformation. The 
similarities and differences we observe in the patterns of the misinformation and corrections contained 
within the stream over the days that followed the attacks suggest directions for possible research 
strategies to automatically detect misinformation. 
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1 Introduction 


Social media use is becoming an established feature of crisis events. Affected people are turning to these 
sites to seek information (Palen & Liu, 2007), and emergency responders have begun to incorporate them 
into communications strategies (Latonero & Shklovski, 2011; Hughes & Palen, 2012). Not surprisingly, one 
concern among responders and other officials is the rise of misinformation on social media. In recent crises, 
both purposeful misinformation, introduced by someone who knew it to be false, and accidental 
misinformation, often caused by lost context, have spread through social media spaces and occasionally 
from there out into more established media (Hill, 2012; Herrman, 2012). 

In a study on Twitter use after the 2010 Chile earthquake, Mendoza et al. (2010) claimed that 
aggregate crowd behavior can be used to detect false rumors. They found that the crowd attacks rumors 
and suggested the possibility of building tools to leverage this crowd activity to identify misinformation. 
However, currently there are no such tools, and the notion of the self-correcting crowd may be overly 
optimistic. After Hurricane Sandy, a blogger claimed to have witnessed the “savage correction” by the 
crowd of false information spread by an aptly named Twitter user, @comfortablysmug (Hermann, 2012), 
yet many were guilty of retweeting this and other misinformation during the tense moments when Sandy 
came ashore (Hill, 2012). 

This research, which focuses on the use of Twitter after the 2013 Boston Marathon bombings, seeks 
to understand how misinformation propagates on social media and explore the potential of the crowd to 
self-correct. We seek to better understand how this correction functions and how it varies across different 
types of rumors. Our larger goal is to inform solutions for detecting and counteracting misinformation using 
the social media crowd. 
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2 Background 


2.1 The Event: 2013 Boston Marathon Bombing 


At 2:49 pm EDT on April 15, 2013, two explosions near the finish line of the Boston Marathon killed three 
people and injured 264 others (Kotz, 2013). Three days later, on April 18 at 5:10pm EDT, the Federal 
Bureau of Investigation (FBI) released photographs and surveillance video of two suspects, enlisting the 
public’s help to identify them. This triggered a wave of speculation online, where members of the public 
were already working to identify the bombers from photos of the scene (Madrigal, 2013a). Shortly after the 
photo release and a subsequent related shooting on the MIT campus, a manhunt resulted in the death of 
one suspect and the escape of the other. Following the shoot out, at 6:45 AM on April 19, the suspects were 
identified as brothers Tamerlan and Dzhokhar Tsarnaev (FBI, 2013). Dzhokhar, the surviving bother and 
“Suspect #2” from the FBI’s images, was found and arrested on April 19 at 9pm EDT. 


2.2 Social Media Use during Crisis Events 


Researchers in the emerging field of crisis informatics have identified different public uses of social media 
during crises: to share information (e.g. Palen & Liu, 2007; Palen et al., 2010; Qu et al., 2011), to participate 
in collaborative sense-making (Heverin & Zach, 2012), and to contribute to response efforts through digital 
volunteerism (Starbird & Palen, 2011). Social media are a potentially valuable resource for both affected 
people and emergency responders (Palen et al., 2010). Twitter in particular has been shown to break high- 
profiles stories before legacy news media (Petrovic et al., 2013). This research focuses on misinformation 
(false rumors) shared through Twitter in the aftermath of the Boston Marathon bombing on April 15, 2013. 


2.3 Misinformation on Twitter 
Misinformation on social media represents a challenge for those seeking to use it during crises. This concern 
has been voiced in the media (Hill, 2012) and by emergency responders who are reluctant to depend on it 
for response operations (Hughes & Palen, 2012). A few emergency managers who were early adopters of 
social media note that identifying and counteracting rumors and misinformation are important aspects of 
their social media use (Latonero & Shklovski, 2011; Hughes & Palen, 2012). 

Mendoza et al. (2010) found that Twitter users question rumors, and that false rumors are more 
often questioned than rumors that turn out to be true. They theorized that this crowd activity could be 
used to identify misinformation. 


2.4 Diffusion of Information on Twitter 


An important aspect of the misinformation problem on social media relates to the diffusion of information. 
On Twitter, the retweet (RT Qusername) functions as a forwarding mechanism. During crisis events, a 
large percentage of tweets are retweets, which spread very quickly. Kwak et al. (2010) reported 50% of 
retweets occur in the first hour after a tweet is shared and 75% within the first day. As this information 
diffuses, it loses connection to its original author, time, and the context it in which it was shared, an effect 


that complicates verification. 


3 Method 


3.1 Data Collection 

We collected data using the Twitter Streaming API, filtering on the terms: boston, bomb, explosion, 
marathon, and blast. The collection began April 15 at 5:25pm EDT and ended April 21 at 5:09pm EDT. 
During high volume time periods, the collection was rate-limited at 50 tweets per second. Additionally, the 
collection went down completely (Figure 1, black rectangle) and experienced two windows of repeated short 
outages (Figure 1, grey rectangles). 
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Figure 1: Tweet Volume Over Time 


Our data set contains ~10.6 million (10.6M) tweets, sent by 4.7M distinct authors. 42% of tweets have URL 
links and 56% are retweets using either of the two most popular conventions (RT @ or via @). Figure 1 


shows the volume of tweets per minute. 


3.2 Exploratory Analysis: Identifying Rumors 

Our exploratory analysis identified salient themes and anomalies in the data. First, we enumerated the 100- 
most prevalent hashtags in the data and created a network graph of relationships between them (Figure 2). 
Each node represents a popular hashtag and is sized by the log number of times in appears in the set. Each 
edge connects two hashtags that appear in the same tweet and is sized by the log number of times they co- 


occur. 


Figure 2: Network Graph of Co-Occurring Hashtags in Boston Marathon Tweets 


*The #boston hashtag was dropped from this graph, because it connected with every other tag. 
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Next, we examined tweets that contained specific hashtags to understand their context. A salient theme 
was the presence of several rumors. For example, tweets with #prayforboston contained a highly tweeted 
(false) rumor about a young girl killed while running the marathon. And an investigation of a politically 
themed section of the graph (in light blue at the top, left corner) revealed an interesting hashtag, #falseflag 
— positioned between #tcot (which stands for “top conservatives on Twitter) and #obama—that 
accompanied rumors claiming U.S. government involvement in the bombings. 

Through this process, we created a list of rumors grounded in the Twitter data set. We chose three 
false rumors and proceeded to do a systematic analysis of tweets that referenced them. 


3.3 Analysis: Manual Coding of Tweets 


We selected search terms that resulted in samples that balanced comprehensive and low noise to identify 
tweets related to each rumor. Then, following the method outlined by Mendoza et al. (2010), we coded each 
distinct tweet within each rumor subset. We used an iterative, grounded approach to develop the coding 
scheme, eventually settling on three categories: misinformation, correction, and other (which encompassed 
unrelated and unclear). Our categories align well with the categories used by Mendoza et al. (2010): affirm, 
deny, other. 


4 Findings 


4.1 Rumor 1: A Girl Killed While Running in the Marathon 


The most blatant false Twitter rumor focused on a photo of an eight year-old girl running in a race, 
accompanied by the claim that she died in the Boston attack. The rumor began just hours after the bombing. 
Its history on Twitter reveals that at approximately 6:30pm EDT, @NBCNews announced that an eight 
year old spectator had been killed in the bombings. About 45 minutes later, a Twitter user sent a message 
ascribing a female gender to the victim with the assumption that she was a competitor: 


@TylerJWalter (April 15, 2013 7:15pm): An eight year old girl who was doing an amazing thing 
running a marathon, was killed. I can’t stand our world anymore 


Four minutes later, another user added a fake picture and purposefully spread the false rumor: 


@_Nathansnicely (April 15, 2013 7:21pm): The 8 year old girl that sadly died in the Boston 
bombings while running the marathon. Rest in peace beautiful x http://t.co/mMOi6clz21 


This original rumor was retweeted 33 times in our set, but it soon began to spread in many different forms, 
from different authors. We identified a set of 93,353 tweets (and 3275 distinct tweets) that contained both 
“girl” and “running.” After coding each distinct tweet and applying those codes to the larger set, we found 
92,785 tweets to be related to the rumor. 90,668 of these tweets were coded as misinformation and 2046 
tweets were corrections, resulting in a misinformation to correction ratio of over 44:1. This finding contrasts 
starkly with Mendoza et al.’s (2010) study, which found about a 1:1 ration between the two. 

Significantly, peak correction did occur roughly within the same hour interval as peak 
misinformation, suggesting reactionary community response. Examining the volume at log scale reveals the 
volumes of correction and misinformation to rise and fall in tandem much of the time, though the correction 
often lags behind the misinformation. Perhaps the most troublesome aspect of the graph shows 
misinformation to be more persistent, continuing to propagate at low volumes after corrections have faded 
away. 
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Figure 3: Girl Running Rumor, Tweets per 10 Minutes 
*Light grey rectangle in top image highlights window of focus for bottom image 


4.2 Rumor 2: False Flag — Navy Seals or Craft Security or Blackwater Agents as Perpetrators 


After the FBI released images of dark, exploded backpacks, users of social media spaces like 4chan and 
Reddit began to collect and analyze images of suspicious individuals wearing backpacks at the scene. One 
set of images included two men standing together wearing the same clothes (caps, pants, and boots), 
carrying heavy dark backpacks of the style shown in the FBI photos. Some speculated these individuals 
might have been involved in a drill or in the actual attack (Watson, 2013). An emblem on one of their hats 
suggested to some that the men were affiliated with U.S. military special operations, specifically Navy 
SEALs. Later, speculation shifted to claims that they were agents of Blackwater or Craft International, 
both private military/security firms. Each explanation supported a larger claim — that the bombings had 
been a “false flag” attack, either staged or actually carried out by the U.S. Government (Watson, 2013). 

For Rumor 2, we coded tweets that contained either “navy seal” or “blackwater” or “black ops” or 
(“craft” and “security”). 
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Figure 4: False Flag Rumor, Tweets per 10 Minutes 


Rumor 2 had far fewer tweets than Rumor 1, only 4525 total. 3793 of these were misinformation, 212 were 
corrections, and 520 were coded as other, most of those being unrelated. 

The diffusion of Rumor 2 appears to progress somewhat differently from Rumor 1, peaking once at 
the beginning but then persisting and eventually gaining steam again at the end of the collection period. 
However, some aspects of the relationship between misinformation and corrections are consistent with 
Rumor 1—volumes of both often rise and fall together, though there is often a lag between the spike in 
misinformation and the resulting rise in corrections. Again the ratio of misinformation to correction was 
high (18:1), though significantly smaller than for Rumor 1 (Chi-square with Yates correction, p<0.0001). 

Because of the context of this rumor, it is unlikely that the corrections themselves had much effect 
in stemming the flow of misinformation. Even after the shoot-out, capture and identification of the Tsarnaev 
brothers, the rumor returned. Examining the content of correction tweets suggests that those who sent 
them were part of a separate conversation, criticizing the speculation but not interacting with those 
participating in it. 


4.3 Rumor 3: Digital Vigilantes — The Crowd Misidentifies Sunil Tripathi as a Bomber 


Sunil Tripathi was a 22-year-old Brown University student who went missing on March 16. In the weeks 
before the bombing, his family was actively searching for him, using social media and formal media outlets 
to raise awareness (Bidgood, 2013). 

Within a few hours of the FBI releasing the grainy surveillance photographs of the bombing suspects, a few 
different Twitter users claimed that Tripathi looked like Suspect #2. In the most notable example, a former 
high school classmate of Tripathi’s posted a tweet noting the resemblance (April 18 at 7:39pm). Shortly 
thereafter, a Reddit thread became focused around speculation of a connection between Tripathi and 
Suspect #2 (Reddit, 2013). The rumor continued into the early morning hours as the Watertown shootout 
was occurring. The following tweets fueled the spread: 


@ghughesca (April 19, 2:43pm): BPD has identified the names: Suspect 1: Mike Mulugeta. Suspect 
2: Sunil Tripathi. 


@KallMeKG (April 19, 2:50pm): BPD scanner has identified the names : Suspect 1: Mike 
Mulugeta Suspect 2: Sunil Tripathi. #Boston #MIT 


Neither statement was true, but within minutes many Twitterers, including some traditional media outlets, 
retweeted this misinformation (Madrigal, 2013b). Just before 6:40am, the FBI and news outlets released 
the Tsarnaev brothers’ names, effectively resolving the confusion, and the Boston Bombing conversation 
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soon veered away from Tripathi. On April 23, Sunil’s body was discovered. He had died weeks before the 
bombings (Bidgood, 2013). 

For Rumor #3, we coded tweets that contained either “Sunil” or “Tripathi.” This dataset of 29,416 
tweets had a much lower ratio of misinformation to correction—though still larger than Mendoza et al.'s 
(2010)—at only roughly 5:1 (22,819 misinformation to 4485 correction). In this case, the propagation of 
misinformation took a different shape, with misinformation compressed into shorter time window. Between 
8pm on April 18 and 2am on April 19, tweets linking Tripathi to the bombing were sent at a rate of about 
1-3 tweets per minute, most of which were speculative. Then, between 2:40am and 3am, misinformation 
jumped from 40 tweets in ten minutes to 4690. Peak volume, however, was not sustained as it was for the 
other two rumors, declining at an (exponentially) steady rate, with occasionally mini-spikes. 
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Figure 5: Sunil Tripathi Rumor, Tweets per 10 Minutes 


* Light grey rectangle in top image highlights window of focus for bottom image 


Remarkable here is the interaction between misinformation and correction. Again, corrections lagged behind 
misinformation in time and overall volume. However, they grew steadily and eventually overtook 
misinformation—soon after the official announcement naming the Tsarnaev brothers as suspects, which 
managed to effectively end the spread of misinformation within a few hours. Although data loss prevents a 
thorough analysis of how the rumor died off, we can see that corrections had already begun to overtake 
misinformation before the official announcement. For this rumor, corrections persist long after the 
misinformation fades as users commented on lessons learned about speculation. 
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5 Conclusion — Re-evaluating the Idea of the Self-Correcting Crowd 


5.1 Characterizing Misinformation 


Mendoza et al. (2010) demonstrated that the social media crowd has the potential to self-correct. Our study 
examines more closely the relationship between misinformation and corrections on Twitter. In support of 
Mendoza, we find evidence of crowd-correction for each rumor but with considerably smaller proportions of 
correction. Though misinformation and correction seem to rise and fall in tandem, they exhibit different 
magnitudes and a lag between the onset of misinformation and the correction. If we characterize these as 
patterns or signatures (Nahon, et al 2013), the frequency and wavelength of misinformation and correction 
are often aligned, but the amplitude can be exponentially different and there is often a delay in the correction 
signal. Additionally, in cases like Rumors 1 and 2, misinformation can persist at low levels that no longer 
activate significant corrections. 


5.2 Future Work 


This is preliminary work in a larger research effort on understanding the propagation of rumors through 
social media. We eventually would like to develop methods for automatically identifying misinformation by 
detecting the corrections. In the immediate future, we intend to analyze a larger set of rumors related to 
multiple crisis events. We hope to identify patterns or common types of rumors, possibly using “signatures” 
or characteristic patterns of misinformation and corrections over time. We intend to examine links within 
tweets—the URLs themselves and the domains to which they belong—to see if these features offer insight. 
Preliminary work suggests that tweets with misinformation contain more links than tweets with corrections, 
but that corrections tend to link to a higher number of different sources. 
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Abstract 

Individuals may encounter distinct kinds of challenges in assessing credibility in a social Q&A setting 
where they interact with strangers. It is necessary to better understand how people make credibility 
judgments when seeking information using social Q&A services because people increasingly use such 
services to obtain personalized answers from a large pool of unknown people. In this paper, we report 
preliminary findings from a quasi-field study where participants were asked to use Yahoo! Answers for 
one week and were interviewed afterwards. We find that participants’ assessment of the credibility of 
strangers who answered their questions occurred in three different dimensions: attitude, trustworthiness, 
and expertise. Furthermore, different elements were noticed and interpreted in each dimension of the 
credibility assessment. Our work provides insights into source credibility assessment in social Q&A 
settings and implications for the design of social technologies that better support people’s online 
credibility assessment. 
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1 Introduction 


Today, online social tools and services enable people to easily reach the crowd to seek information in the 
context of their daily lives. An example of such online services is a social question-answering (Q&A) service. 
Social Q&A services such as Yahoo! Answers allow people to meet their information needs by asking 
questions and receiving answers from other users on a broad range of topics. People are increasingly using 
social Q&A services to seek information because these services enable them to obtain personalized answers 
to their questions quickly from a large number of people (Harper, Raban, Rafaeli, & Konstan, 2008; Shah, 
Oh, & Oh, 2008). 

Credibility research has found that many people find it difficult to judge the value and credibility 
of information based on author, content, and source on the Web due to a lack of quality control mechanisms 
and a limited number of available cues (Metzger, 2007; Rieh, 2002). In social Q&A settings, where people 
interact with people they do not know and with online content created by those people, individuals may 
encounter different challenges in judging the credibility of information. For example, when evaluating 
information on social Q&A sites, do people distinguish between the sources of information (i.e., answerers) 
and the content of answers? Do people become more dependent on new types of social cues in this process 
of finding credible answers? 

Prior work has addressed issues surrounding credibility assessment in social Q&A settings, such as 
the identification of criteria used to evaluate answers and the effect of particular cues on trust in the 
answerer (Golbeck & Fleischmann, 2010; Kim, 2010; Kim & Oh, 2009). However, we still know relatively 
little about how people make credibility judgments in this new online environment. The rapid recent growth 
of social tools and services that enable interactions with the crowd has magnified the importance of 
understanding people’s credibility assessment of strangers. 

We focus on examining how individuals judge the credibility of unknown people who answer their 
questions in a social Q&A setting. To address this question, we conducted a quasi-field study on a social 
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Q&A service, Yahoo! Answers. The preliminary findings from the study indicate that participants’ 
assessment of the credibility of strangers who answered their questions occurred in three different dimensions 
of attitude, trustworthiness, and expertise. Moreover, different elements were noticed and interpreted in 
each dimension of the credibility assessment. 


2 Related Work 


As a principal component of information quality, credibility is the believability of some information and its 
source. It is a multi-dimensional construct with two main components: expertise and trustworthiness 
(Metzger, 2007; Rieh, 2010). Credibility is not a property of information or a source, but it is the judgment 
and perception of an individual (Metzger, 2007; Rieh, 2010). 

The prominence-interpretation theory proposed by Fogg (2003) suggests that online credibility 
assessment entails two phases: noticing an element and making a judgment about the noticed element. The 
former refers to prominence, while the latter refers to interpretation. Hilligoss and Rieh (2008) developed a 
theoretical framework of credibility assessment that includes three distinct levels of credibility judgments: 
construct, heuristics, and interaction. The construct level relates to how users conceptualize credibility. The 
heuristics level entails credibility assessment based on general rules of thumb. The interaction level involves 
effortful assessment of specific sources or content cues. 

In Web environments, it is difficult to identify or authenticate a source of information (Metzger, 
Flanagin, & Medders, 2010). Source attribution research has emphasized that the source of Web-based 
information is what or who one believes it to be (Sundar & Nass, 2001) and thus individuals tend to 
distinguish between different levels of sources, and salience of source attributes at the time of evaluation 
may affect people’s credibility assessment (Flanagin & Metzger, 2007). In this vein, a number of studies 
have examined the effect of source attribution on credibility assessment in the context of online reviews. 
People appear to be influenced by information describing reviewers’ identity or expertise that is available 
either in a profile or in the content of review when assessing helpfulness of online reviews and credibility of 
online reviewers (Forman, Ghose, & Wiesenfeld, 2008; Willemsen, Neijens, & Bronner, 2012). 

With regard to research on credibility judgments in social Q&A settings, studies have reported that 
people pick up affective cues such as attitude or tone, which are embedded in questions and answers (Kim, 
2010; Kim & Oh, 2009). Furthermore, any cues may be helpful for developing trust in online settings where 
there is no strong community or where users often lack long-term engagement, as is the case with social 
Q&A sites (Golbeck & Fleischmann, 2010). 


3 Methods 


A quasi-field study was conducted in order to obtain data drawn from participants’ experiences in the 
context of their daily lives. Yahoo! Answers (http://answers.yahoo.com/) was selected for this study because 
it is the largest and most popular social Q&A service. We instructed participants to use Yahoo! Answers 
for a period of one week and interviewed them at the conclusion of one week. Twenty-one undergraduate 
students (age range, 19 to 24 years) from a research university in the Midwestern United States participated 
in this study.' Eight (40%) were males and 12 (60%) were females. The majority of participants (60%) had 
little or no experience with Yahoo! Answers. 

Data were collected through a background questionnaire, interviews, and a_post-interview 
questionnaire. In particular, semi-structured in-person interviews served as the primary source of data 
collection, gathering data about participants’ overall experience using Yahoo! Answers for this study and 


1 One participant was excluded from the data because the participant only answered questions in Yahoo! Answers without posting 


any questions. 
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their question asking and answer evaluation process in each episode. The content of questions submitted by 
participants and answers they received were also collected. 

All interviews were transcribed and coded. The initial set of codes was developed based on the 
interview protocols and then additional codes were added to the code book through iterative analyses of the 
interview transcripts. In the present paper, we report preliminary findings based on the analysis of interview 
data, focusing on participants’ credibility assessment of strangers who answered their questions in Yahoo! 
Answers. 


4 Findings 

While a small number of participants reported that they were not very concerned about the credibility of 
those who answered their questions, in general participants assessed the credibility of the answerer by 
utilizing the limited cues available in the social Q&A setting. Specifically, people’s perceived credibility of 
the answerer was constructed based on credibility assessment occurring in three different dimensions: 
attitude, trustworthiness, and expertise. In addition, the credibility assessment in each dimension was based 
on people’s interpretations of certain elements they noticed, as Fogg (2003) suggested. 


4.1 Three Dimensions of Credibility Assessment 


4.1.1 Attitude-Dimension of Assessment 


In the attitude-dimension, people assessed the answerer’s involvement and effort. In particular, people 
judged how much the answerer had been invested in and had participated in Yahoo! Answers, and how 
engaged the answerer was, and whether he or she did hard work. Elements people noticed in this dimension 
were cues that tended to require relatively less effort. These included a profile picture, Yahoo! Answers 
points or levels, a top contributor badge, the act of answering itself, and the act of doing research. 

Although only seven of twenty participants utilized system-generated cues such as a profile and top 
contributor badge, those who did found them useful to gauge the level of involvement of the answerer. With 
regard to the profile, S01 indicated that uploading a profile picture meant that person is “a little bit more 
invested in actually participating in the site.” Similarly, S04 stated that having a profile picture showed the 
answerer’s investment of his time in Yahoo! Answers. Some participants used information on points or levels 
in Yahoo! Answers from the profile to judge the answerer’s involvement. For example, S04 said that those 
who had higher levels were “the people who spend more time on Yahoo! Answers.” Participants also 
perceived that those who had top contributor badges were users who were making large contributions to 
the site. However, it was noted that some participants who did not use these cues voiced suspicions about 
the top contributor badge, stating that having it did not necessarily mean the answerer provided high- 
quality answers. 

Participants appeared to appreciate the fact that those who answered their questions took their 
time to answer them. Both S06 and S11 mentioned that the act of answering itself indicated that that 
person knew something and made an effort because that person spent time to write the answer. In a similar 
vein, S08 described the significance of effort in assessing credibility of the answerer, stating that the answerer 
seemed to do his research, given the content of received answer. 


4.1.2. Trustworthiness-Dimension of Assessment 
With respect to trustworthiness, people judged the answerer’s intention or decency. These judgments were 
based on elements such as punctuation, wording, format, links, and the way of answering. Compared to 
elements perceived in the attitude-dimension, these elements required more effort because participants 
needed to read the content of the answer in order to notice and interpret cues. 

Participants believed that the way that the answerer typed and punctuated, and the answerer’s 
word choice and formatting style determined the legitimacy of the answerer. For example, S04 stated that 
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“if people write out the punctuation, that means that they want to avoid spam.” Furthermore, participants 
considered those who included sources such as links to websites more trustworthy as these answerers 
provided objective evidence that supported their answers. Interestingly, one participant (S21) mentioned 
that she perceived the answerers were unbiased and thus trustworthy because they were strangers who 
knew nothing about her. 

In addition, for some participants, the way of answering mattered in assessing the answerer’s 
credibility. S08 stated that he could believe the answerers because “they’re not trolling here.” Similarly, 
S10 reported that the answerer who was “making a joke” or “acting like it’s a message board” lost his or 
her credibility. In contrast, some participants had a fundamental belief that people were well-intentioned, 
given that they did not think that “people take time out to answer someone else’s question to lie (S06).” 


4.1.3. Expertise-Dimension of Assessment 

When assessing expertise, participants evaluated the perceived knowledge or experience of the answerer. 
Participants noticed and utilized a wide range of cues to decide whether the answerer had the necessary 
expertise, knowledge, or experience to answer their questions. These elements required the most effort, in 
comparison to elements perceived in the previously mentioned two dimensions, as people needed to read 
and process the content of the answer or to take the extra step of clicking a link to access more detailed 
profile information. 

The content of an answer itself played an important role in helping participants to assess answerers’ 
qualifications. S06 stated that self-proclaimed expertise in the answer made him think that “he knows what 
he’s talking about.” Providing a specific answer which exactly met the needs of the person who asked the 
question also seemed to indicate answerers’ experience, as S08 reported. Another content-related cue used 
by participants was congruence between the answerer’s response and that of other users who provided 
answers. S07 said that she believed the answerer because “there was already like two [other] people that 
said the same thing.” 

Along with content-related cues, system-generated cues based on social feedback were also used. 
For example, some participants went to the answerer’s profile and looked at other questions that person 
had answered previously. According to S05, who posted a track-related question, the fact that the answerer 
“answered some other questions about track” indicated that “it’s not a random person answering,” 
enhancing the answerer’s credibility. Similarly, S03 said that it seemed obvious that the answerer had “some 
sort of experience or some sort of knowledge in finance,” as this person “answered a lot of questions related 
to finance.” 


5 Discussion and Conclusion 


We have presented the preliminary findings from a quasi-field study conducted on Yahoo! Answers. The 
preliminary findings demonstrate that people employ the limited cues available in Yahoo! Answers to assess 
the credibility of strangers who answer their questions. This credibility assessment takes place in three 
different dimensions with different elements being noticed and interpreted in each dimension. 

Research on question asking using social network sites such as Facebook and Twitter has shown 
that people prefer to obtain an answer from those in their social networks over unknown people because 
they tend to trust the opinions of people they know (Morris, Teevan, & Panovich, 2010). In Yahoo! Answers, 
people interact with strangers they do not know and have no prior relationship with; thus, the asker is 
responsible for assessing the credibility of the answerer. Our preliminary findings provide insights into what 
kinds of cues people use in order to perceive the credibility of the source of information, strangers who 
answer their questions, in a social Q&A setting. Moreover, by identifying multiple dimensions of source 
credibility assessment in the social Q&A setting and discovering elements that people notice, this study 
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helps to inform social technology designers about what elements need to be salient to better support people’s 
credibility assessment. 

Future work will be needed to develop a more nuanced understanding of people’s credibility 
assessment of the crowd in the social Q&A context. It would be interesting to examine how these three 
dimensions of assessment interact in the credibility assessment process to affect the perceived credibility of 
information obtained in the social Q&A setting. In addition, we could consider the degree of effort expended 
by an individual to make credibility judgments in each dimension. This might allow us to develop a new 
credibility assessment framework applicable to the social media environments based on Hilligoss and Rieh’s 
(2008) framework. 

In spite of several limitations of this study, including homogeneity of participants, the artificial 
number of questions to be posted that was imposed on participants, and selection of one particular social 
Q&A site, we believe that our work contributes to a better understanding of people’s credibility assessment 
in the social Web environment by identifying specific elements people may notice and interpret in order to 
make credibility judgments about strangers in the social Q&A context. 
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Abstract 

Understanding children’s digital play in immersive virtual spaces, specifically those with limited commu- 
nication affordances, demands new methods and approaches that move beyond interviews and participant 
observation. This paper illustrates the process of creating machinima videos of scripted play scenarios as 
“cultural probes” to elicit young users’ insider knowledge of communication and socialization practices. 
We discuss our ongoing development and use of these videos, supplementing other qualitative methods 
to develop a richer understanding of information sharing, particularly non-verbal communicative action. 
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1 Introduction 


If a pink penguin wearing a tutu started chasing you and throwing hearts, how would you react? While 
most of us do not encounter such behavior in our physical lives, children’s digital play is filled with similar 
interactions. Every day millions of children login to virtual worlds where they play, socialize, create, and 
explore a digital landscape as avatars, or “virtual characters.” Virtual worlds designed expressly for children 
ages 5-10 years comprise the largest and fastest growing segment of this web genre. These environments 
represent a new space for childhood socialization rich in interactivity, but they differ in many ways from 
the communication spaces where most children develop their understanding of language and its use: the 
home, the playground, and the classroom. Many virtual worlds for children have constrained conversation 
features that restrict the way users engage with each other. This makes non-verbal behaviors, including the 
use of emoticons, avatar movements, and playful gestures that much more important to the way meaning 
is constructed. In these safe yet verbally limited play environments, conversational practices emerge from a 
blend of verbal and non-verbal cues. 

To date, much of the research analyzing virtual worlds for children has employed focus groups and 
interviews, through which children describe their online behaviors through recollection (Marsh, 2010). New 
research methods are needed to understand the different ways that children employ site features and com- 
municative affordances of virtual worlds to engage with other users. In particular, ephemeral interactions 
such as flirting, teasing, harassment, and aggressive or unwanted contact are difficult to capture in words 
and to reliably communicate. Part of the challenge, then, is bridging child and adult constructions of the 
same event or phenomena, a problem identified in studies of young people’s social media broadly. 

Our project is working to overcome some of these challenges through machinima, the process of 
creating videos from screencasts of virtual play (AMAS, 2005). We use scripted screen capture to create 
reusable behavior scenarios that become the focus of individual and group interviews about play and infor- 
mation exchange in-world. Scholars in HCI, educational technology, and librarianship have proposed and 
examined the use of machinima in learning environments (Bardzel et al. 2006; Daly-Swansen, 2007; de 
Frietas, 2006; Middleton, 2009; Snelson, 201), but these studies have examined adult users. This study 
breaks new ground by exploring how machinima objects can mediate conversations with young people for 
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the purpose of understanding situations that are sometimes difficult to put in words. We argue that such 
video artifacts can be considered “cultural probes”, a design tool used in HCI to elicit tacit understandings 
of the users of information systems and services. Our paper will discuss the process by which we identify 
scenarios of interest, script, act out, and record these scenarios using avatars in the virtual world, then 
employ resulting Machinima videos in our empirical investigations of non-verbal communication with youth 
informants. This work seeks to understand how communicative affordances translate into informative and 
communicative intents; in other words, how tools facilitate translating thoughts into action and subsequent 
meanings (Denkel, 1980). 


2 Theoretical Framework 


Our work is informed by theories of informative and communicative intent (Bratman, 1987; Malle et al, 
2001). Engaging with other participants in a virtual world requires a user to infer meaning from a series of 
cues that can take a number of forms, including textual exchanges, avatar movements and gestures, and 
emoticons. Making meaning from these aggregate signals depends on the experience of the user, including 
the users’ ability to relate in-world gestures to out-world gestures, as well as the knowledge of in-world 
norms and practices, such as the use of “133t speak”, slang or jargon. Communication in a virtual world is 
then dependent on two levels of meaning making: in-world comprehension of digital action and a translation 
of that meaning to out-world significance. The two levels of meaning combined become a “literacy” of digital 
play. To be fluent in this kind of literacy requires immersion in the culture of the space. It is this culture 
of play that we wish to investigate with the machinima probes. 


er-generated content: report abuse 


How do you get a boy friend on Club Penguin? 
In: Social Network Websites, Club Penguin [Edit categories] 


A: [Improve] 
oh thats easy. theres a lot of ways to do this. 


1. === go up to the guy u like. make sure hes a guy by saying: "hi, boy or girl?” normally he will know ur hitting 
on u after that, but he will try to be coo! about it. u stay cool by adding im to ur friends list and then just 
leaving. after a few minutes, find him and act like u didnt know he was there. start a conversation with him like 
this: "oh hi! lol i didnt see u there. sooo00 what's up?" eventually, when things get closer, invite him to ur 
igloo. when ur both there and walking around, make the heart icon. === 

2. === dress up as best and as sexy as u possibly can. open ur igloo up. then go to town, or where ever there's 
a lot of guys. say: "WIN MY HEART AT MY IGLOO/IGGY!!!" sometimes it takes a long time for ppl to show up, 
but other times a TON of pp! will be in ur igloo! ask the boys questions like: " what's ur fave color/sport/kind of 
music/song/band/ ect. then pic the one u like best! === 

3. === just sit there, looking sexy. he'll come. === 


HOPE THAT HELPS'!! :D 


Figure 1: How to get a boy friend in Club Penguin 


For example, a participant may wish to establish a relationship with another user, a common scenario in 
virtual worlds, even those involving young children. We know from children’s role-play that acquiring a 
“boyfriend” models normative gender roles in real play, and thus is frequently seen in digital play as well. 
The informative intent is the user’s intention to inform the audience (other users) of something, or to induce 
belief in other users (e.g., I want to have a boyfriend). As further means of fulfilling this intention, the 
participant may communicate this intent, explicitly or implicitly. Figure 1 is a screen shot from a Yahoo! 
Answers stream in which a participant explains how to translate informational intent (availability and 


desire) into communicative actions (flirting with other users). 
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As this communicative engagement requires multiple parties to understand a series of cues and 

their meaning in context, unpacking these ephemeral moves can be problematic. However, controlling the 
creation, playback, and reflection upon communicative scenarios by reifying the communicative action, 
allows us to develop better understanding of children’s inferences. 
Cultural Probes is one way of describing this kind of research instrument. Gaver and colleagues developed 
the technique to examine how users reflect on their own practices by surfacing tacit understandings. Cultural 
probes are a common approach in HCI (Gaver, et al. 1998; 2004); however, they are less often employed in 
educational research, and then only with older learners. 

Studying virtual spaces for children is fraught with difficulty for the researcher who wishes to 
respect the rights of children while gathering rich data on youth behaviors. By nature of their design, 
children’s social play spaces such as Club Penguin and Pixie Hollow protect user privacy and safety by 
limiting participant interactions. Unlike studies that take place in Second Life, users cannot consent to 
participate in research, engage in interviews or voice chat, nor share artifacts and documents. This often 
confines the researcher’s role to engaging with children outside the world, as one cannot be sure those in- 
world are even children. 

This project arose from the desire to gain new insights into sensitive topics (flirting, harassing, and 
“mean girl” behavior) while recognizing the limitations of children to describe these situations and adults 
to mediate conversations about these topics with younger participants ages 7-12 years. We are attempting 
to work at the intersection of three spaces: the ethically responsible, the technically feasible, and legally 
permissible. We have worked with our Ethics Review Board to design work we feel fits these requirements. 


Pecan Neverwillow  . 


p f ` i Hale Cloudriver 
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Figure 2: Fairies Interact in Pixie Hollow 


We first spent several months exploring the issues we hoped to investigate, and took notes on the commu- 
nicative practices of users through participant observation in several virtual worlds. We settled on two 
worlds in which to construct our machinima simulations: Club Penguin and Pixie Hollow. These two worlds 
have significant user populations, have broad participation among children ages 6-12, and permitted us to 
explore two variables of interest: avatar form and social status. Pixie Hollow employs hominid avatars of 
boys and girls (Fairies and Sparrow Men), while Club Penguin allows users to choose a single form: an 
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anthropomorphized penguin. We hypothesized that the variation in the avatar’s form may affect the com- 
municative affordances of users, and thus developed user scenarios that could be played out in either world. 
We scripted two scenarios: 1) a flirting interaction between two users, and 2) a commercial interaction 
where a user engages another user by “begging” in the gift store. Figure 3 shows what a virtual “stage” 
looks like in these probes. Several other scenarios have been proposed in our research group, including a 
role-play party, but these are not yet in development. 

Two research assistants “played out” these scenes in-world using avatars created specifically for the 
scenario. This required the assistants to enter the same world and server using two computers, one of which 
was using Camtasia screen capture software to record the interaction. The resulting videos were minimally 
edited using Camtasia and iMovie ’11 into final clips. The intent of the editing was to clean the videos and 
add an opening title. Unlike cinematic users of machinima, our goal was not to alter the point of view or 
framing of the interactions. Each scenario lasts less than 90 seconds, but employs 4-6 types of communicative 
intents, including the use of text chat, emoticons, gifting, avatar proximity and profile views. Our presen- 
tation will discuss several lessons learned in the development of these videos, including shooting schedules, 
rehearsal and storyboarding, and scripting for authenticity. 


3 Findings 

We are in the beginning stages of deploying our prototype machinima scenarios with participants with the 
intent of refining them and using them systematically in a study of preteen virtual world bahaviors. To 
date, we have interviewed five young people (one boy age 9, four girls ages 8-12), conveniently recruited 
from among the parents in our school’s degree programs. Thus, our findings regarding this method’s useful- 
ness are very preliminary. Nonetheless, even with this small sample, several themes are emerging from our 
work that suggest this technique is both viable and engaging, eliciting insights from youth that add value 
to our research on online interactions. Below we document several themes that emerged in our initial con- 
versations with young users. 


3.1 “Oh, that’s obvious...” 


We introduced our videos to participants with a rather simple prompt: “Can you tell me what’s happening 
here?” and then followed-up with several additional prompts depending on how the young person engaged 
in describing the activity on screen. Two of our participants read the scenes very quickly and described the 


phenomena consistent with how we designed the scenes to be ‘ 


‘read”. However, the variation in terms of 
our participants understanding of the different communicative intentions suggests that the scenarios were 
not as obvious as we, the adults, perceived them to be. What we regarded as fairly straightforward activity 
was confusing to our male participant. As we test our scenarios with more young people, we may find 
variations in ability to read and interpret social situations based in age, gender, and level of experience. At 


this point our sample is too small to make claims. 


3.2 “That's not how | do it...” 


We received several valid critiques of our initial foray into scripted scenarios, with several of our female 
participants indicating that they approached flirting and interacting with other users in a different fashion. 
While our videos were based on our participant observation by three research assistants and hundreds of 
play hours, it is likely that our scenarios will not match every experience. Nonetheless, to the extent that 
the videos prompted reflection and discussion of participants’ own styles of interaction, the probes were an 
initial success. One of the female participants offered, “I could show you right now!” 


3.3 “I could do that...” 


Although several of our participants had viewed videos of virtual worlds on YouTube, none of them had 
actually made a machinima video before. However, viewing and engaging with our videos piqued their 


672 


iConference 2014 Eric M. Meyers 


curiosity and inspired interest in developing their own machinima. An extension of this work may be a 
machinima workshop for youth in which they develop their own scripts and role-play artifacts to share with 
others. As our work develops, we may move in the direction of a more participatory movie making experi- 
ence, with the outcomes including youth-focused creativity and expression, in addition to the basic research 
design originally envisioned. 

An important limitation of the work to date is that we have used these scenarios only with a small 
sample of children. However, we feel that the work is already a contribution to the field in terms of a 
thought piece and methodological provocation: how can we expand the range of techniques through which 
we engage with young people to explore their meaning making practices? We suggest that machinima probes 
is one way forward in developing multi-method approaches to the study of online interactions in immersive 


spaces. 


4 Conclusions 


In a world of phatic communication—tweets, pokes, and status updates—it is vital that we understand the 
role of communicative intents in virtual spaces for children. This method may provide a reliable, replicable 
approach to studying children’s interpretations of virtual interaction that neither compromises the rights of 
children, nor violates system use policies. For many young children, these virtual worlds may be their first 
foray into web-mediated communication, thus the practices that emerge from these spaces hold special 
significance in the development of online literacies. 

The development of replayable scenarios as a research tool to probe children’s cultural practices 
allows us to extend our work into areas that have not been examined to date, including how early users of 
social media make sense of a variety of social stimuli. In an effort to bridge research and practice, we hope 
to develop a kind of pedagogy from these research objects. Future work will involve the design of curriculum 
that employs these materials to teach about online behaviors to elementary age children, with a focus on 


developing inferential reasoning, rather than the memorization of platitudes. 
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Abstract 

In this paper, we took IPL2 and Yahoo! Answers as the two samples for our case study of digital references 
and community-based Q&A sites. We examined the services of the two systems based on 200 real 
questions raised in IPL2 and their similar questions found in Yahoo! Answers. Question type, topic 
classification, answer type, and answer time delay were compared between the similar questions in these 
two platforms. The result analysis showed that the two systems classify their questions differently, and 
the types of the questions asked are different too. It took much longer time to obtain answers from IPL2, 
whereas different types of questions in Yahoo! Answers generated dramatically different response time. 
However, some differences also demonstrated that there is need to consider integrating certain ideas in 


the two systems. 
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1 Introduction 


With the wide usages of the Web, people increasingly seek for information online in their professional and 
personal lives. Web search engines are playing important roles in satisfying people’s information needs, but 
they still suffer limitations for handling people’s complex questions and needs. Therefore, asking questions 
to and obtaining answers from a human being (rather than a machine as in search engines) have also been 
extended to the Web, and have developed into two commonly used platforms: online digital reference 
extended from traditional face-to-face library reference services (Jeffrey Pomerantz, Nicholson, Belanger, & 
Lankes, 2004), and community-based question and answering (Q&A) that resembles asking questions among 
friends (Liu, Bian, & Agichtein, 2008; Y. Liu, et al., 2008). 

There are many digital reference services and community-based Q&A systems developed over the 
years. Both platforms have also drawn a great amount of research interests, which cover areas such as the 
characteristics and services of digital references (Janes, 2002; Lankes, 2004; Jeffrey Pomerantz, et al., 2004), 
questions and answers in community-based Q&A sites (Fichman, 2011; Gazan, 2007, 2011), and the 
comparison of services provided by the two platforms (Wang, 2007; Wu & He, 2013). 

Our ultimate research goal is to explore the integration of these two services into one coherent 
framework so that the strengths of one can compensate the weaknesses of the other. As a preliminary study 
toward this goal, this note paper, therefore, focuses on exploring the problem space with the following 


research question: 


e RQ: what are the similarities and differences in terms of question type, topic classification, askers’ 
intentions, answer types, and answer time delay between the similar questions really asked in these 


two platforms? 
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The motivation for us to concentrate on similar questions is that there are existing studies in the literature 
on examining the connections between the two platforms in general (Wang, 2007) and through a set of 
carefully crafted experiment questions (Wu & He, 2013). However, there is no study examining the 
connections based on truly occurred similar questions on these two sites. We want to emphasis on “truly 
occurred questions” because these questions would tell us more accurately about what exactly happening 
in those platforms. The reason that we paid attention to “question type, topic classification, askers’ 
intentions, answer types, and answer time delay” is because these are important (though incomplete) 
parameters for examining the possibility of integrating these digital reference services and community-based 
Q&A services. 

In this study, we adopted case study as our main research method, and selected IPL2 and Yahoo! 
Answers as the two typical cases for collaborative digital reference and community-based question answering 
respectively. IPL2! was developed in January 2010 by the School of Information Science and Technology of 
Drexel University by combining IPL (Internet Public Library) and LII (Librarians’ Internet Index). Yahoo! 
Answers? is one of the most popular English community-based Q&A sites, which has high reputation with 
both big user populations and active Q&A communities. They each represent a top quality, well-known 
system in their own type of services. Although we acknowledge the potential questions on the 
generalizability of our results due to our case study research method on just two samples, we think that the 
results are still invaluable for a preliminary study. 

To obtain similar questions from the two platforms, we first selected the 200 questions in the last 
month (June 1-30, 2011) of our IPL2 transaction logs. Then we developed queries based on each of these 
200 questions and searched in Yahoo! Answers for similar questions. We manually judged the similarity 
between the returned Yahoo! Answers questions and the original IPL2 questions, and kept only those 
returned Yahoo! Answers questions that were judged to be similar enough. Next, for each IPL2 question, 
we ranked the remaining Yahoo! Answers questions according to their similarity to the original IPL2 
question, and also according to the time that these questions were posted in Yahoo! Answers. To enable us 
to complete the study, we only sampled up to three similar Yahoo! Answers questions for each IPL2 
question. That is, if there was only up to three similar Yahoo! Answers questions were found, we kept all 
three questions. When there were more than three similar Yahoo! Answers questions, we retained the one 
that is most similar, the one with the earliest time in Yahoo Answers, and the one that is closest in time to 
the question asked in IPL2. However, not all IPL2 questions can find their similar ones in Yahoo! Answers. 
In total we located 157 Yahoo! Answers questions by the day 15 August, 2012. This gave us 157 pairs of 
similar questions between IPL2 and Yahoo! Answers. 

We do acknowledge that the above method is just one of many approaches for finding similar 
questions between the two platforms. For example, we could start with questions in Yahoo! Answers, and 
then find similar questions in IPL2. However, in practice, since the number of questions in IPL2 is much 
smaller than that in Yahoo! Answers, we think that our approach actually would give us higher chances to 
find more similar questions between IPL2 and Yahoo! Answers. 


2 Comparison and Discussion 


As stated, we will compare IPL2 and Yahoo! Answers based on the question type, topic classification, 
askers’ intentions, answer types, and answer time delay. They can be classified into question comparison 


and answer comparison. 


1 http://www.ipLorg/ 
? http: //answers.yahoo.com/ 
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Yahoo! Yahoo! 


IPL2 IPL2 Rank of 
Rank À . Answers Answers 
Question Type Question Question Yahoo! i 
of IPL2 Question Question 
Number Percentage Answers 
Number Percentage 
1 Exploratory question 51 25.5% 3 29 18.5% 
9 Factual question 43 21.5% 4 23 14.6% 
3 Informational question 36 18% 1 59 37.6% 
4 Navigational question 35 17.5% 2 35 22.3% 
5 List question 21 10.5% 6 2 1.3% 
6 Definition question 14 7% 5 9 5.7% 
Total 200 100% 157 100% 


Table 1: Question Types of IPL2 and Yahoo! Answers 


2.1 Comparison of Questions 

Borrowed ideas from several existing work (Gazan, 2011; J. Pomerantz, 2005; Voorhees, 2002), we classify 
the questions into six types, which are Factual questions, List questions, Definition questions, Exploratory 
questions, Informational questions and Navigational questions. Table 1 shows the distribution of the 
questions from both IPL2 and Yahoo! Answers. The first impression is that the IPL2 questions and their 
similar Yahoo! Answers share similar distributions on many question types. For example, List questions 
and Definition questions are among the smallest in percentage. However we also notice that the most 
common question in IPL2 is Exploratory questions which has 25.5%, and Factual questions and 
Informational questions are at the second and third. But the most common Yahoo! Answers questions 
belong to Informational questions (37.6%). Navigational questions are the second, and Exploratory questions 
are the third. This shows that IPL2 questions are in general more complex and difficult to answer, whereas 
Yahoo! Answers questions are most often aim for some ready answered information. 


Subject Areas of the Questions 


Science; History; Literature; Other; Library; Business; General Reference; Humanities; 
Education; Biography; Geography; Government; Heath; Entertainment /Sport; 

IPL2 Sociology; Computers; Internet; Music; Religion; Politics; Hobby; Household/Do-It- 
Yourself; Psychology; Military 


Arts & Humanities; Beauty & Style; Business & Finance; Cars & Transportation; 
Computers & Internet; Consumer Electronics; Dining Out; Education & Reference; 
Entertainment & Music; Environment; Family & Relationships; Food & Drink; Games 
Yahoo! Answers : . 
& Recreation; Health; Home & Garden; Local Businesses; News & Events; Pets; 
Politics & Government; Pregnancy & Parenting; Science & Mathematics; Social 


Science; Society & Culture; Sports; Travel; Yahoo! Products 


Table 2: Question Subject Categories and Numbers in IPL2 and Yahoo! Answers 
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Both IPL2 and Yahoo! Answers provide classifications to the subject areas of the questions (see Table 2). 
Based on the 157 pairs of similar questions, we compared the classifications of the similar questions in the 
two systems. Using Yahoo! Answers subject categories as the base, Table 3 shows the number of pairs that 
have consistent subject label only at the top first level of subject category in Yahoo! Answers, only at the 
second level category, at both levels and no corresponding label at either level. The results show that 
majority of the questions have different category labels (106 out of 157), but there are still some questions 
that are being classified similarly at both levels (13 out of 157) or at least at the top level (23 out of 157). 
Therefore, it would take consider amount of mapping effort to connect IPL2’s subject categories with that 
of Yahoo! Answers, but some subject categories can be used for starting points in the integration. 


Consistent with Consistent with A PN Not Consistent 
siste 
. the 1st Level the 2nd Level cage be with Any Level 
Question Type . . Both Levels in . 
Category in category in Yahoo! A Category in 
ahoo! Answers 
Yahoo! Answers Yahoo! Answers Oe E Yahoo! Answers 
Exploratory questions 6 4 4 ee 
: 13 
Factual questions 12 1 0 
; 26 
Informational questions 0 1 3 
ee i 19 
Navigational questions 2 5 1 
List questions 1 3 3 1o 
ere i 2 
Definition questions 2 1 2 
Total 23 15 13 me 


Table 3: The Classification Comparison between IPL2 Questions and Yahoo! Answers Questions 


Morris et al. (2010) showed that 52% of their respondents used their social networks with the intention to 
ask for recommendations or other types of opinionated questions, whereas in general these kinds of 
opinionated questions are not common in library references. We examined the intention behind the 200 
IPL2 questions, and only identified three questions (1.5%) as opinionated questions. In contrast, we found 
that 54 of the 157 Yahoo! Answers questions (34.4%) were either seeking personal recommendations or 
subjective opinions. This result confirms Morris et al.’s findings. It seems that people often view digital 
references and online social Q&A systems as two different services. Our further examination of these 
questions revealed that most of the opinionated questions in Yahoo! Answers are exploratory questions with 
open-end answers. More study is needed on how to support users’ such information needs. 


2.2 Comparison of Answers 


Depends on whether the returned information contains direct answers or related references/links to look for 
answers, we divided the answers of the questions from IPL2 and Yahoo! Answers into five types: direct 
answers, related references/links, both, no answers, and others. Table 4 shows that majority IPL2 questions 
always contain related references/links as part of the answers, which helps to establish the authenticity and 
authority of the answers. This is due to the professional training of the reference librarians in IPL2. In 
contrast, answers from Yahoo! Answers most often just contain direct answers without any references and 
links. This is useful for the users who just want to have ready answers, but it is difficult for the users to 
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establish the correctness and authority of the answers. This is true for both the selected best answers and 
non-best answers in Yahoo! Answers. Therefore, users in Yahoo! Answers need better support in determining 
the answer qualities. 

There are 19 IPL2 questions contain answers classified as “others”. Top common instances of the 
“others” include 1) pointing users to the FAQ available in IPL2 and online, 2) failing to find the answer, 
so describing what searches had been done, and 3) pointing out that third parties (such as original sales 
staff) should be contacted rather than IPL2. Therefore, we can see that the professional training of IPL2 
librarians help the users even when the questions cannot be answered well. There are cases in Yahoo! 
Answers that even the voted best answers are still wrong or offending. 


Direct Related Both No Others Total 
Answers References/Links š Answers = ree 
IPL2 8 98 73 2 19 200 
Best answers of Yahoo! 114 23 14 3 3 157 
Answers 
Non-best answers of 84 9 17 0 1 111 


Yahoo! Answers 


Table 4: Answer Types of Questions in IPL2 and Yahoo! Answers 


It takes time to answer a question, and the time delay for a question being answered could affect people’s 
impression of Q&A services. IPL2 usually give back askers’ one answer, which establish the first answer 
time. But sometimes, the askers and the librarians may conduct follow-up interactions until the last answer 
was given back. We view this last reply with an answer as the best answer time in IPL2. If there is only 
one reply from IPL2 to the user, the first answer time is also the best answer time. Yahoo! Answers usually 
provide multiple answers, one of which has the earliest answering time (thus the first answer time) and one 
of which is voted as the best answer (thus the best answer time). In both IPL2 and Yahoo! Answers, the 
time difference between the first answer time/the best answer time and the time that the questions was 
asked is the time delay for the first answer or the best answer. 

Table 5 shows that the time difference between IPL2’s first answer time delay and that of its best 
answers is 765 minutes (close to 13 hours), which is relatively small considering the average time delay for 
the best answer is 35389 minutes and that for the first answer is 35236 minutes (both are roughly 8 days). 
This means that the follow-up interactions after the initial answers are less common and often short. 
Therefore, the roughly 8 days delay for obtaining the first answer is the biggest issue in IPL2. In contrast, 
Yahoo! Answers, with its participatory design, requires only in average about 8 hours for producing the 
first answer and about 15 hours for the best answer. Therefore, Yahoo! Answers has very obvious advantage 
over IPL2 on quickly replying to the askers. This is probably a clear angle for the integration of digital 
reference and community-based Q&A. 


Time Delay for the Best Answer Time Delay for the First Answer 
average max i ‘ average max min 
f f min (minute) , . ; 
(minute) (minute) (minute) (minute) (minute) 
IPL2 11602 35389 52 10837 35236 ee 
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Yahoo! 


Answers 


913 11520 2 495 11520 1 


Table 5: Time Delay for Obtaining Answers in IPL2 and Yahoo! Answers 


We further correlated time delay with question types. Interestingly, as shown in Table 6, different types of 
questions did not make great difference in IPL2. There are only noticeable differences at the delay to the 
first answers in List questions and Navigational questions. If we have to pick up a question type for IPL2 
to spend the most time, it is Exploratory questions in both first answer and best answer cases. This probably 
makes sense since Exploratory questions in general need more time to answer. 

In contrast, different question types make dramatic difference in Yahoo! Answers. Definition 
questions had the quickest answer time on both first answers and best answers, which took less than 2 hours 
in average. Exploratory questions took long time to answer, but they were not among longest in Yahoo! 
Answers’ (rank number 4 for the best answer and 2 in the first answers), nor in comparable range with 
IPL2 (9-11 hours vs. 8-9 days on both the first and the best answers). It is interesting to see that 
Navigational questions took the longest delay in Yahoo! Answers, which are about 12 hours for the first 
answers and 24 hours for the best answers. We cannot figure out the reason, so further study is needed. 


Delay to the best answer Delay to the first answer 
Question Type average time (minute) average time (minute) 
IPL2 Yahoo! Answers IPL2 Yahoo! Answers 
Definition questions 11221 105 11021 105 
f 12665 781 12204 550 
Exploratory questions 
achwalquesions 11847 411 11728 344 
: : 11272 1137 10429 468 
Informational questions 
i . 10439 971 9583 394 
List questions 
10961 1466 8807 759 


Navigational questions 


Table 6: Time Delay on Different Question Types 


We know that many questions in IPL2 were answered by library science students, which might not be 
greatly different to the users in Yahoo! Answers in most demographic parameters. However, it is with 
careful professional training in their studies, mutual help among peers, and close supervision by experienced 
librarians that the superiority of their answer quality was noticed in the literature (Wu & He, 2013). We 
observed 83 such mutual help and close supervision cases among the 200 IPL2 questions. This could be a 
feature that is useful to be maintained in future IPL2 as well as be implemented in Yahoo! Answers. 


3 Conclusion 


In this paper, we took IPL2 and Yahoo! Answers as the two samples for our case study of digital references 
and community-based Q&A sites. We examined the services of the two systems based on 200 real questions 
raised in IPL2 and their similar questions found in Yahoo! Answers. Our result analysis show that the two 
systems classify their questions differently, and the types of the questions asked are different too. It took 
much longer time to obtain answers from IPL2, whereas different types of questions in Yahoo! Answers 


generated dramatically different response time. However, some differences also demonstrate that there is 
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need to consider integrating some of the ideas in the two systems. Our further study lies on studying more 
cases of digital references and community-based Q&A sites, as well as examining in detail the response 
quality and time-taken to respond from digital reference and community based Q&A sites, the feedback 
from the questioners, and the usability of the questions by other users. Ultimately, we want to research on 
how to borrow insights of community-based question answering to improve digital reference and exploit the 
development of hybrid tools. 
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Abstract 

The study reported in this paper is part of an ongoing research project examining Asian immigrants’ 
information behaviour in South Australia. Involving eight Asian participants, the pilot study was 
conducted from March to April 2013. The study used questionnaires, photovoice, and interviews to collect 
data relating to participants’ information needs, information sources, and information grounds, 
attempting to capture both everyday and formal requirements for settling in South Australia. The 
preliminary results indicate that these immigrants have a diverse range of information needs, with various 
preferred information sources from multiple information grounds. Use of the Internet and strong virtual 
and real social networks are both important sources and grounds. The results indicate both the 
participants’ competencies and the challenges they have faced. The photographic images provide a further 
dimension to the analysis. 
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1 Introduction 


According to the 2011 census, in Australia, the number of people born overseas as of 2011 is 6,489,874, 
accounting for 30.2% of Australia’s population (Australian Bureau of Statistics, 2011). Twenty-three per 
cent of the immigrants are Asian, and the majority of them are from China, India, and Vietnam. In South 
Australia, the number of immigrants is represented by 26.7%, with 19.8% from Asian countries. An 
immigrant is defined as a person who was born overseas but either possesses an Australian permanent- 
resident visa or has become an Australian citizen. Making the adjustments required for settling into a new 
country, culture, and language requires many skills on the part of the immigrant and goodwill on the part 
of the country of destination. An inclusive multicultural society depends on respect for others’ heritage, 
with equal opportunities and support for all within the country. Because Australia is a country with a 
substantial number of immigrants, this study is considered important, addressing a lack of evidence-based 
research on how immigrants in Australia deal with information needs and sources, and their information 
grounds and information sharing during their immigration process. In order to ensure that immigrants’ 
settlement process is smoother, it is significant to understand their information behaviour (Caidi et al., 
2010). 

Information behaviour, including information needs and information seeking, reflects a person’s 
needs to know certain things in a particular environment, as well as that person’s capacity to obtain such 
information. It may be directed and purposeful or received without any intention (Case, 2007); it may be 
acted on or not acted on in a person’s daily life (Pettigrew et al., 2001; Wilson, 2000). Information needs 
are often associated with or leading to the seeking of information. Everyday-life information seeking (ELIS) 
refers to the behaviour whereby people acquire information through their daily activities (such as watching 
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television, meeting a friend, and visiting a doctor), behaviour in which they may not be seeking particular 
information purposefully (Pettigrew et al., 2001; Wilson, 1994) 

In understanding immigrants’ information behaviour, it is also important to discover where they 
find their information. Pettigrew (1999, p. 811) defines an information ground as “an environment 
temporarily created by the behaviour of people who have come together to perform a given task, but from 
which emerges a social atmosphere that fosters the spontaneous and serendipitous sharing of information”. 
Information grounds may include formal and informal settings, such as health community clinics (Pettigrew, 
1999), public libraries (Fisher et al., 2004), and social-networking sites (Counts & Fisher, 2010), as well as 
the local supermarket, café, and cultural gatherings. 

While some studies on immigrants’ information behaviour have been conducted in Israel (Shoham 
& Strauss, 2008) and New York (Fisher et al., 2004), there is only limited research into this area (Fisher et 
al., 2004). In addition, some noted studies related to information behaviour in Australia have focused on 
refugees or immigrants in general (Kennan et al., 2011; Lloyd et al., 2010 and 2013). Those papers have 
mainly discussed information and its relation to social inclusion (Kennan et al., 2011) or else information 
poverty, literacy, and social exclusion (Lloyd et al., 2010 and 2013). To the best of our knowledge, there 
are no prior studies with deep analysis on Asian immigrants’ information behaviour in Australia. 

Incorporating ELIS into more formal information-seeking behaviour is still exploratory (McKenzie, 
2003). The present study aims to identify Asian immigrants’ information needs, their information sources, 
and their information grounds during their settlement process in South Australia. The findings will 
significantly provide empirical evidence to Australia, which may assist in the formulation of policies to 
facilitate the settlement and social inclusion of immigrants and to better support service planning and 
management. The study reported in this paper addresses the following research questions: 


What kinds of information do Asian immigrants need? 
How do Asian immigrants search for or collect information to satisfy their information needs? 


1 

2 

3. What are Asian immigrants’ information grounds? 

4. Is there any aspect of their settlement, such as the challenges that they face and the strengths they 


have in a new country? 


2 Research Design 


2.1 Participants 

The target group of participants in this study was immigrants from Asian countries who lived in South 
Australia. As Caidi et al. (2010, p. 495) have defined immigrants, “international migrants include anyone 
living outside their country of citizenship but the condition of permanence in the term immigrant excludes 
those living abroad temporarily, such as visitors, migrant workers, and international students,” we set the 
following screening criteria for recruiting Asian participants in this study: those who were not born in 
Australia; hold permanent-resident visas or have become Australia citizens; are neither visitors nor 
international students; do not have an Australian partner; and currently live in metropolitan Adelaide, 
South Australia. 

Potential participants were approached initially from the researcher’s personal network and then 
the participants’ networks via e-mails, telephones, and face-to-face invitations. Finally, eight Asian 
immigrants (3 males and 5 females) from various backgrounds participated in this pilot study. They were 
from China (n = 2), India (n = 1), Vietnam (n = 2), South Korea (n = 1), Indonesia (n = 1), and Singapore 
(n = 1). Their ages ranged between the ages of 25 and 35 inclusive. Most of the participants (n = 6) had 
been in Australia for less than five years and were thus considered newcomers (Caidi et al., 2010). Three of 
the participants were university students, while the rest were professionals working either in academic 
institutions or in the private sector. More than half of the participants had post-graduate degrees. Despite 
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the various languages spoken by Asian immigrants, this study was conducted in English, since all the 
participants involved in this study had good English competency. The researcher had no difficulties 
communicating with them in English. The data were collected from March to April 2013. 


2.2 Data collection instruments and procedures 


We employed a combination method of questionnaires, photovoice, and interviews to collect data on 
immigrants’ information behaviour. With questionnaires we gathered demographic data on the participants, 
and gained a general understanding of their information behaviour with a combination of open-ended and 
closed-ended questions. To enrich these data, we used photovoice, in the forms of photos and personal 
stories, to allow the participants to express themselves. Finally, we used interview sessions to discuss the 
photos taken by the study participants. 

Photovoice is a relatively new technique in research methodology, “a process by which people can 
identify, represent, and enhance their community through a specific photographic technique” (Wang & 
Burris, 1997, p. 369). The data, in the forms of photos, enabled the exposition of ideas that could not easily 
be expressed by words and thus enabled further exploration (Briden, 2007; Wang & Burris, 1997; Weber, 
2008). Photovoice has been developed as a method that empowers the participants (Wang & Burris, 1997). 
One of the advantages of photovoice methods is that the participants feel more comfortable as they decide 
themselves what to capture (Julien et al., 2012). Photovoice was used in this study to explore the 
immigrants’ lived experiences more deeply, from their own perspectives. Interviews were employed to clarify 
and discuss images captured by the photovoice method. 

In the first phase, the researcher contacted all participants, setting an initial meeting time. 
Questionnaires were sent by e-mail or by mail to the participants as per their requests. During that meeting, 
the data-gathering process was explained. The researcher also provided a short training session regarding 
what photos to take, including the ethics considerations. For collecting the photo data, participants could 
use cameras provided by the researcher or their own equipment. The participants were given a week to 
capture eight to ten images that reflected aspects of their immigration process which indicated their 
information needs, their information sources, how they sought information, strengths and barriers in their 
information seeking, places where they met friends and shared information, and any interesting aspects 
related to their information behaviour. It turned out that all of the participants employed their own mobile 
phones to capture the photo data. 

Following the photovoice, around one week later, the researcher contacted the participants again 
to organise a time with each participant for the interview session to discuss the photos captured. In terms 
of image copyright, the researcher gained permission to use the photos in the publication. The interviews 
were conducted in places convenient to the participants, such as university libraries, malls, or the 
participants’ offices. Each interview was audio-recorded and ranged from 30 to 60 minutes in length. 


2.3 Data analysis 


Numeral statistics, presented in percentages, was used to analyse the questionnaires. The images from the 
photovoice method were analysed and classified using the participatory analysis, inclusive of the process of 
selecting, contextualizing, and codifying (Wang & Burris, 1997). Images were classified into themes 
according to the story behind the picture as discussed with the participants. The interview data, moreover, 
were transcribed and analysed by content analysis, grouped into certain categories with open coding (Weber, 
1990). Then, for the analyses, the researcher combined the groups of themes resulting from the interview 
and photovoice results. 
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3 Preliminary results 


3.1 Information needs of Asian immigrants 


In a new country, addressing information needs is one of the challenges immigrants face (Caidi et al., 2012) 
and considered to be an ongoing process. In this study, the areas that the participants ranked as the most 
important information needs in relation to their settlement in a new country were categorised into three 
types: personal information needs (i.e. individual needs), general information needs (i.e. environmental- 
location needs), and formal information needs (i.e. institutional and legal needs) (Table 1). The more 
specifically information needs are satisfied, the smoother the settlement process is (Shoham & Strauss, 
2008). 


Asian immigrants’ information needs 


Personal General Formal 
Job/employment Accommodation Immigration 
English literacy Transportation Education 

Networking (friends/ : , 
i . Local culture/ lifestyle Tax assistance 
family /community) 
Computer skills City profile/ orientation Legal aid 
Health insurance Road safety and driving Banking 


Table 1: Asian immigrants’ information needs 


Similar results on the personal and general information needs of US immigrants in Israel have been reported 
by Shoham and Strauss (2008). Formal information needs were identified as a distinguished kind of need in 
this study. While Shoham and Strauss have identified needs as personal and general, the inclusion of the 
formal category in this study enables more thorough analysis. 

The information needs of one person may be different from those of others. In this study, all 
participants agreed that job/employment appeared as the biggest challenge. For example, information about 
job vacancies, job references, and how to apply for jobs was crucial, particularly when immigrants were not 
employed at their first arrival. As one participant said: 


Finding a job is the real challenge. I felt anxious every day about finding a job related to my field, 
as I did not know anybody here. I had to deal with online application and strategies to address the 
selection criteria. Any information regarding the vacancy was very useful [helpful]. I came to public 
libraries for job application advice. Finally, after I had tried several times, I got the job that I 
dreamt of (study participant 2). 


3.2 Information seeking and information sources 


With the availability of electronic substitutes for almost all manual information sources, the way people 
seek information is shifting. Amongst New Zealand immigrants, the Internet was the main source of 
information (55%) (Mason & Lamain, 2007). In a study in Ireland, family and friends (43%) and the Internet 
(35%) were identified as immigrants’ two main information sources (Komito, 2011). 

The participants in this study had their general and formal information needs met largely by the 
Internet (87.5%), while they sought their personal information mainly through families, friends, and 
colleagues in their broader social networks (62.5%). One third of the participants reported that they used 
social networks as an information source to find information on jobs and social activities. The participants 
used the Internet in their initial information searches, and then they took further action, such as telephoning 
the number provided by the websites for clarification or for requesting further information, especially when 
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dealing with formal organisations in Australia, such as Immigration Offices and the Australian Department 
of Human Services, which is responsible for a range of payments and other services for immigrants. 


3.3. Information grounds and information sharing 


According to Fisher et al. (2004, p. 756), “information ground can occur anywhere [...] predicated on the 
presence of individuals”. In this study, homes, cafés, malls, offices, campuses, and libraries were various 
information grounds identified by the participants. Immigrants substantially used new information and 
communication technologies to exchange and share information. Most of the participants (75%) had online 
social-network accounts (such as Facebook, LinkedIn, and Twitter) to communicate, particularly with 
friends or family back in their home countries. Relying on Facebook, the participants said that they learnt 
from their friends’ practical experiences during settlement. This sort of personal information could not be 
gained from formal organisational websites, and online social networks played a significant role in 
information ground and information sharing. These results confirm the findings reported by Bates and 
Komito (2012), who agree that there is a strong connection between immigrants and social-media networks. 
Before them, Counts and Fisher (2010, p. 104) have claimed that “online settings can serve as the 
information ground”. Understanding the places (such as Facebook) where immigrants meet and share 
information allows services to be directed to appropriate sites. 


3.4 Photovoice results 


Using the photovoice technique to collect information behaviour data is an innovative dimension in this 
study. It also provides high relevancy in terms of study participants as target groups with their barriers in 
languages. In this study, photovoice was used to explore through images the immigrants’ ideas and 
experience during their settlement. With the images, while the researcher exploring the stories behind the 
pictures, the participants expressed what they felt in a relaxed interview session. The participants found it 
easier to tell the stories represented by the pictures. 

All participants chose to use their mobile phones to capture an aspect of their experience that 
would be developed into an extended narrative with the researcher. A total of 38 photos were received from 
the participating immigrants. For lack of space, four representative images have been selected and presented 
below (Figure 1). 


Figure 1: Photovoice images of Asian immigrants’ information behaviour in Australia 


Figure 1 shows four selected photos (numbered clockwise from top left), illustrating immigrants’ information 
needs, information seeking and sources, challenges met during the settlement process, and information 
grounds. 
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Photo 1: Information needs. This photo of housing billboards captured the information needs of 
understanding the process of finding accommodation and other issues such as the bond for house renting, 
legal contracts, and regular inspections. 

Photo 2: Information sources. This image of a South Australia Government website illustrated the 
role that the Internet plays as an initial and preferred information source for finding out information on the 
immigration process. 

Photo 3: Challenges. This image of an advertisement for English classes identified how important 
proficiency in English was perceived to be for adapting to life in Australia. It also depicted English as a 
challenge, particularly in the immigrants’ first few months after arrival. 

Photo 4: Information grounds and social networks. This photo illustrates a group of friends from 
various cultural backgrounds sharing a meal, reflected in one place for immigrants to meet, share, and 
mingle with other people. 

Photovoice is a substantial way to explore more comprehensively the immigrants’ information 
behaviour, because each picture may tell a story based on the immigrants’ contexts. For example, one 
participant took the picture of the house billboard (Photo 1), and it represented other related issues, such 
as house bonds, legal matters, and inspection, that were completely new to that participant. 


3.5 Aspects of settlement: challenges and strengths 


Australia provides many services to its immigrants, many of them online. The Asians in this study were all 
computer-literate, and so using the Internet to find out information may not have been a challenge for 
them. While technology literacy ensured access to online information, competency in the English language 
and understanding of cultural differences were still found to be major challenges, even after several years of 
living in Australia. English and cultural differences as barriers and challenges are consistent with the 
findings reported by Fisher et al. (2004) on immigrants in the US. 

Living in a multicultural society allows the immigrants to maintain their cultural identities. Our 
results indicate that immigrants’ computer literacy allows them to become transnational citizens (Vertovec, 
2004), hopefully adapting well to Australia and being able to maintain their connections with their countries 
of origin. While all participants assessed their reading skills as reasonable, coping with the spoken language 
and adjusting to the Australian accent were perceived as difficulties, particularly in the first few months 
after arrival. Some cultural differences also emerged as challenges when the immigrants first arrived, 
including the custom of addressing everyone casually by their first names; the man not being the one who 
always pays for a meal; visits to friends’ homes having to be previously arranged; shops closing early; and 
there being less frequent public transportation on weekends and public holidays. 


4 Conclusion and further research 


Understanding the information behaviour of Asian immigrants is crucial to such countries as Australia, 
which have a large proportion of immigrants in their populations. It is significant to identify immigrants’ 
information needs, information sources, and information grounds, as well as their strengths and challenges 
in the settlement process. All of this information will provide valuable empirical evidence, both for policy 
makers, who are involved in the immigration processes and seeking to provide better services, and for the 
immigrants themselves, who may then benefit from having services and pathways that fit their needs and 
reflect their patterns of behaviour. 

With eight educated young-adult Asian immigrants as participants, some initial findings have been 
acknowledged. Although the study is preliminary, its findings can depict immigrants’ information behaviour 
in Australia at a glance. Three categories of information needs (personal, general, and formal) have been 
identified, and may provide an initial framework of what Asian immigrants need in their settlement. The 
Internet has been found to be their initial and preferred information source, with family and friendship 
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networks not far behind. Computer literacy has become the study participants’ strength in the settlement 
process, allowing them to adapt to their new surroundings. Social-networking sites are considered the most 
important information ground. In this rapidly changing information environment, the photos captured by 
study participants’ mobile phones have been used to represent the participants’ voices and strengthened 
the evidence gathered. The role played by technology has led to the development of social networks and 
allowed the immigrants to be transnational citizens. 

While in this pilot study English was not a barrier between the researcher and the participants, in 
our future study we shall seek interpreters to assist the researcher to communicate with participants if they 
require. We are currently recruiting more immigrants (some 300 participants) to participate in the ongoing 
study, which considers demographic differences such as greater age and education differential. Stratified 
sampling (Fowler, 2009) will be used as a purposive sampling technique to ensure that the study reflects 
the total immigrant population. Future analysis may include reference to how age, level of education, period 
of stay, and gender influence the immigrants’ information behaviour and identify their possible inter- 
relationships. 
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Abstract 

The system of modern higher education has gone through many reforms, but still features many deficits 
in terms of knowledge acquisition and learning methods. The concept of gamification, which means the 
implementation of game elements in non-game contexts, offers a possible solution and increases 
motivation and participation among the students. Therefore the project The Legend of Zyren was 
initiated to mediate learning contents via a gamified framework. This part of the study focuses the so 
called guild quests (group tasks) and their principles of construction regarding the collaborative and the 
competitive game pattern, which ultimately result in an increase of the learning success. The results of a 
final evaluation confirm the usefulness of the use of game elements and game patterns with regard to 
content mastery and learning progress. 
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1 Introduction — Gamification in Academic Teaching 

The landscape of higher education shows many deficits regarding knowledge acquisition and teaching 
methods. Especially the lack of motivation is one of the major problems that have to be overcome to ensure 
an optimal learning success (Lee & Hammer, 2011). 

A solution for these problems offers the concept of gamification, as games drive motivation and 
engagement (Prensky, 2003, p.1). Gamification means the use of game elements in non-game contexts which 
serve to motivate and engage users in certain actions (Deterding et al., 2011). Game elements consider 
aspects such as story, experience points (XP) and levels, achievements, leaderboards and rankings, and the 
so called quests, which have to be implemented in the given context (Fecher, 2012). Putting a game in the 
role of a mediator of learning content also enables users to acquire knowledge in a different way. Knowledge 
acquisition with the help of game patterns embedded in increasing difficulty levels creates a cycle of expertise 
and experienced learning that enforces content mastery (Bereiter & Scardamalia, 1993; Gee, 2007). 
Therefore the project The Legend of Zyren was initiated at the department of Information Science at the 
University of Diisseldorf to implement game elements in the academic teaching context and investigate 
their influence on the learning success of the students. The project was organized in three parts that were 
coordinated with each other to teach the content of the subject Knowledge Representation. The basic 
contents of the subject are conveyed in a classical lecture, whereas the tutorial aims at intensifying and 
consolidating the knowledge. Additionally a web based platform was designed, where the students can learn 
on their own in form of a text-based learning adventure. The ultimate goal of the adventure is to achieve a 
reward in terms of a better degree in the final exam with the help of experience points (XP). All of the 
parts were embedded in the framework of the epic fantasy adventure The Legend of Zyren in which the 
students go on a mission through the realm of Zyren to find the mysterious book of knowledge. On their 
journey the students have to face a lot of dangers and challenges in forms of quests. Besides the virtual 
quests that the students encounter in the text-based adventure on the platform, there are also real-life 
quests, which are played in the tutorial to deepen the learning content. These quests have to be played in 
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groups, the so called guilds, to progress in the adventure and to gain more experience points to achieve the 
epic reward at the end of the semester. These guild quests, their construction principles and their actual 
implementation are the topic of this note. 


2 Guild Quests — Learning Success through Clever Implementation of Game Patterns 


As already mentioned quests are game elements and represent a part of the framework of gamification. 
Quests are obstacles in the linear flow of the story that a player has to overcome to advance in the game. 
They can be regarded as exercises which can either be solved alone or in a group. A quest is composed of 
different components. It has to feature a clearly defined goal and the quest, or the game, itself with its rules 
and principles. Furthermore there has to be an impulse of the story that initiates the quest and causes the 
need for the player to solve it. The last component of a quest is the reward the player will gain after the 
accomplishment of it and that motivates players and promotes their engagement. Therefore a quest is 
always linked to the obtainment of achievements or experience points, which are directly correlated to the 
player’s progress in a given game (Fecher, 2012, p.2). 

Special forms of groups the players may have to form to solve a quest are the so called guilds. 
Guilds are constructs derived from Massively Multiplayer Online Roleplaying Games (MMORPGs) and can 
be regarded as a special type of team formation. A guild is an “association of players who chose to come 
together to achieve a common goal” (Riegle & Matejka, 2006, p.1). They are considered to function as a 
“ready-made pool of players who have already established a relationship with each other and who will group 
together to accomplish quests” (Riegle & Matejka, 2006, p.1). It is composed of members such as guild 
leader, guild officers or guild members who perform different functions in the team. A guild is formed by 
its members themselves so that they are solely responsible for the guild’s construction and performance 
(Riegle & Matejka, 2006). 

However, using guild quests to achieve learning success require a clever implementation of game 
patterns that correlate with each other. Sebastian Kelle (2012) defines two important patterns that 
contribute to the learning success: The cooperative pattern and the competitive pattern (Figure 1). The 
implementation of these patterns aims at achieving a balance between knowledge acquisition through 
teamwork and engagement through competition, so that “learners should be motivated and “drawn” into 
the game, but not overly distracted from the learning goal” (Kelle, 2012, p.14). 

The collaborative pattern considers the intragroup relationships and focuses on the interaction of 
the members in a given group (Figure 1). It aims at sharing, constructing and expanding collective 
knowledge and understanding (Romero et al., 2012). Despite the aspect of knowledge acquisition, the 
collaborative aspect evokes the development of interpersonal competencies “such as negotiation, 
collaborative decision-making and creative problem resolution” (Romero et al., 2012, p.4) and the positive 
interdependence in terms of socializing or team spirit (Romero et al., 2012, p.6). As groups are often 
arranged in a competition to each other, the competitive pattern regards the intergroup relationships 
between the several teams. This patterns aims at providing motivation and promoting engagement to avoid 
boredom and keep the players drawn to the game. Important factors that contribute to a successful 
implementation of the competitive pattern are conflict, challenge, opposition and conversation (Prensky, 
2001). 

The learning content is embedded and mediated in various stages of a game that have to be 
accomplished to win certain challenges, achieve points and reach a higher level. The content is therefore 
bound to an interactive context that enables the students to “regard themselves as capable of meaningfully 
applying disciplinary content” (Barab et al., 2012, p.520). This form of knowledge acquisition opens a new 
dimension of possible understanding and self-reflective creative learning strategies that the students have 
to develop autonomously. As thinking and learning are connected to actual experiences, memorability and 
content mastery are supported by this use of playful interactive elements (Gee, 2007, p.9). The knowledge 
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which is acquired at one stage of a game has to be applied and intensified to accomplish the next one. 
Feedback loops (Salen et al., 2011, p.12) between the levels and the visual manifestation of the learning 
progress in form of points, levels and rankings (Fecher, 2012, p.2) ensure a process of knowledge acquisition 
that is critically reflected and controlled by the players. They are able to identify their deficits and improve 
their skills, motivated by the aspect of competition, to accomplish the next challenge. This cyclic 
implementation and the constant upward movement of the difficulty levels of a game create a cycle of 
expertise and support content mastery and consolidation (Bereiter & Scardamalia, 1989). Both patterns in 
combination with an innovative way of knowledge acquisition and content consolidation serve to achieve 
learning success and contribute to the successful implementation of gamification in academic teaching. 

A project that made use of game elements for higher education can be found at the Indiana 
University near Chicago. In his work The Multiplayer Classroom game designer Lee Sheldon (2012) 
organized whole seminars according to the principles of a multiplayer game in which were rewarded with 
experience points (XP) for every solved quest. Unlike in a classical lecture, where the student’s lose points 
in case of a wrong answer, the achievement of XP is solely bound to the positive notion of a benefit, as they 
can only be gained but not loosed. Another related project was initiated at the Institute of Play of a New 
York public school. The researchers of the project Quest to learn set up a modularized game based system 
in which the students had to solve quests as subunits of a larger unit of study that equipped them with 
necessary data and knowledge to solve the larger mission. A special focus was set on the aspect of system 
thinking and dynamic and interactive learning (Salen et al., 2011). A similar approach can be found in the 
project Quest Atlantis that represents a story based curriculum to teach persuasive writing. The learning 
content of the subject is embedded in an interactive narrative context to create a learning experience in 
which the user practices the meaningful application of the disciplinary content (Barab et al., 2012). 
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Figure 1: Function of Game Patterns (derived and modified from Kelle (2012) and Romero et al. (2012)) 
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3 Implementation of Guild Quests 


The conception of the guild quests began as early as half a year before the actual start of the project in 
form of a master course. The students of this course developed initial ideas for the realization of the quests, 
which their fellow students of the bachelor course Knowledge Representation had to solve in their guilds 
later on. Every guild quest targets a mediation of the contents of Knowledge Representation in a scientific 
and playful way and is constructed according to the principles of a quest with a special regard to the 
collaborative and competitive pattern. The concept resulted in the realization of 13 guild quests, one for 
each session of the tutorial during the semester. 

To ensure an optimal implementation of the guild quests, the students were organized in 3 tutorials, 
each of them containing a similar amount of 50-60 participants. The tutorials were supervised by 13 tutors, 
who were responsible for the realization of the several guild quests. The students had to form guilds and 
define their roles in the team themselves. Furthermore, they had to invent a name for their guild and a 
challenge claim to introduce them when they were confronted with the other guilds with which they had to 
compete. This can be regarded as a first step towards socializing and team spirit in terms of the intragroup 
relationships of the collaborative pattern and additionally introduces the competitive component regarding 
the challenge claim and the opposition with the other guilds. 

Regarding the structure of the guild quests, it can be differentiated between three different types. 
Either they are designed similar to popular parlor games or they are set up as treasure hunts, in which the 
guilds have to find several stations at which they have to prove their knowledge, or quizzes, which focus 
the question-and-answer principle. The learning content mediated in the lecture prepared the students for 
the competition in the tutorial that integrated the acquired content in form of questions of which the 
difficulty level increased through the course of the semester. To establish a cycle of expertise and ensure 
content mastery and consolidation every quest was dependent on the former quest regarding the tested 
learning content. Feedback loops and the visualized ranking of the guilds on the online platform depict the 
current state of knowledge and indicate possible deficits of the learning progress. 


Week Title Game Principle Content 
Week 1 It’s not Easy to be a Game: Card game in which the guilds compete against each other. History of Knowledge 
Hero Goal: Putting as many cards as possible in the right chronological order. Representation 
(orig. Ein Held zu sein ist Story: The orc Omgha has to be convinced that the guild is ready to 
nicht leicht) embark on the adventure to find the book of knowledge. 
Reward: 1 XP. 
Week 2 Plopp means Stop Game: Quiz in which the guilds compete against each other and have to Basic terms of 
(orig. Denn Plopp heißt jump onto the right field on the floor. Knowledge 
Stopp) Goal: Achieve as many right answers/jumps as possible. Representation, 
Story: Two dwarfs challenge the guilds. Terms and their 
Reward: 1 XP. definitions 
Week 3 War of Terms Game: Board game in which the guilds have to gain streets through the Term orders 
(orig. Schlacht der Begriffe) explanation of terms and can demand rent from other guilds. 
Goal: Gaining as many streets as possible. 
Story: The guild is partying in a pub and plays a board game against 
other guilds. 
Reward: 2 XP 
Week 4 The Golden Tower Game: Game of skill in which the guilds have to pull a stone out of an Hermeneutics of 
(orig. Der Turm aus Gold) unstable tower, if they give a wrong answer. information, 
Goal: Achieve as many right answers as possible so that the tower Bibliographical meta 
remains. data 
Story: The guilds have to save a golden tower from tumbling and 
therefore save their lives. 
Reward: 2 XP. 
Week 5 The Terrible Jabberdy Game: Quiz in which the guilds have to find the right question for a Repetition 


(orig. Der gefiirchtete 
Jabberdy) 


given answer. 

Goal: Achieve as many right questions as possible. 

Story: The guilds have to fight against the monster Jabberdy. 
Reward: 3 XP. 
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Week 6 Stroke of Fate Game: Quiz in which the guilds have to guess whether a given statement Meta Data 
(orig. Schicksalsschlag) is right or wrong. Non-thematical filters 
Goal: Make as many right assumptions as possible. of information 
Story: The guilds have to prove their knowledge of fortunetelling at the 
house of a fortuneteller. 
Reward: 4 XP. 
Week7 Gotta Catch ‘Em All Game: Treasure hunt, in which the guilds have to challenge the tutors to Collaborative content 
(orig. Komm schnapp sie collect 8 medals. indexing, 
dir) Goal: Collect as many medals as possible. Processing of tags, 
Story: The guilds have to collect the 8 medals by challenging the Nomenclature 
smartest creatures in a forest to cross a river with a ferry. 
Reward: 6 XP. 
Week 8 Tabuta Game: Game in which a member of a guild has to explain or draw a Repetition 
(orig. Tabuta) term, while the others have to guess its meaning. 
Goal: Guessing as many meanings as possible. 
Story: The guilds have to help the tribe of the Tabuta to figure out 
which words are permitted for usage in the forest. 
Reward: 6-8 XP. 
Week 9 En garde! Touché! Game: Quiz in which 2-3 guilds have to duel each other. Classification, 
(orig. En garde! Touché) Goal: Achieve more right answers than the opponent. Thesaurus 
Story: The guilds have to duel with hostile pirates. 
Reward: 3-8 XP. 
Week The Facets of Murder Game: Treasure hunt in which the guilds have to find clues of the Ontology, 
10 (orig. Mord in all seinen murder, crime scene and weapon at several stations by solving riddles. Facetted KOS, 
Facetten) Goal: Collect as many clues as possible to solve the murder case. Crosswalks between 
Story: The guilds have to solve a murder case and find out the right KOS 
murder, crime scene and weapon. 
Reward: 4-13 XP. 
Week Zyren goes Hollywood Game: Quiz in which the guilds have to give the right answer to a Citation Analysis 
11 (orig. Zyren goes question. 
Hollywood) Goal: Achieve as many right answers as possible. 
Story: The guilds are captured in a mine and have to struggle for food 
through gaming. 
Reward: 8-14 XP. 
Week Dragon Wars Game: Treasure hunt in which the guilds have to fight against dragons at Intellectual indexing, 
12 (orig. Dragon Wars) several stations to expand their skills for the fight against the final enemy. Automatic indexing 
Goal: Defeat as many dragons as possible to collect many skills for the 
final fight. 
Story: The guilds have to defeat dragons to save an area of Zyren. 
Reward: 0-16 XP. 
Week The End of a Long Game: Quiz in which the guilds have to give a right answer to a Abstracts, 
13 Journey question. Automatic extraction 


(orig. Das Ende einer langen 
Reise) 


Goal: Achieve as many right answers as possible. 

Story: After the guilds drank a magical drink they experience their whole 
journey through the realm of Zyren again and travel places at which they 
have to solve riddles. 

Reward: 10-19 XP. 


Table 1: Structure of the Guild Quests 


of information, 
Repetition 


A guild quest starts with the creation of an epic and mystical atmosphere in terms of animations, videos 
and music and the narration of the story that gives the impulse for the students to solve the quest with the 
help of their guild. Afterwards the rules of the quest were explained and the goal and rewards of the mission 
clearly defined. 

As the students had to work in teams inside their guilds, but still had to fight against the other 
guilds, collaborative and competitive pattern were both integrated in the concept. The collaborative aspect 
was supported by the fact that every guild member actively participated in the quest and negotiated the 
possible solution with his or her team members, which enforced collaborative decision making and creative 
problem resolution, as well as the team spirit. The result of this form of collaboration was a process of 
collective knowledge sharing and expanding that also supported the individual learning progress of the 
students. The competitive pattern with its aspects challenge, opposition, conflict and conversation, which 
can be regarded as an additional incentive for the guilds to win the quest, was supported by 4 factors 
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correlated to the general construction principles of a quest: the game itself with its rules, the clearly defined 
goal, the impulse of the story and the reward afterwards. As the most successful guilds achieve a reward in 
form of XP that they need for the successful passing of the course, the aspect of reward is obviously the 
strongest support for competition. This is also enforced by the ranking of the guilds, since they are motivated 
to overcome other guilds and achieve a higher ranking. Furthermore the XP that can be gained through 
the guild quests are an additional bonus for every member of the guild, as they contribute to the possibility 
of improving their grades in the final exam at the end of the semester. 

To achieve new impulses for the guild quests game, goal, story and reward had to be re-developed 
every week (Table 1) to keep the students’ interest and motivation and ensure a constant and effective 
learning progress. 


4 Evaluation and Results 


At the end of the semester the whole project The Legend of Zyren was evaluated regarding various factors. 
The section of tutorial and guild quests was analyzed with a special regard to the implemented game 
patterns and their influence on the learning success. 

As Figure 2 and 3 suggest, both patterns had a positive influence on the students. 89,2% of the 
students experienced a positive effect of the collaborative aspects on their learning behaviour and 72, 9% 
perceived the competition between the guilds as a motivating factor. 


= totally disagree 
m disagree 
49,1% 40,2% m agree 
a totally agree 


m no opinion 


n=112 


Figure 2: Evaluation of Collaborative Pattern (“The aspect of collaboration in the guild had a positive 
influence on me and my learning progress. ”) 


0,9% 9,0% 


(7 
20,8% 17,1% 
m totally disagree 
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m totally agree 
44,1% 


mno opinion 


n=112 


Figure 3: Evaluation of Competitive Pattern (“The competition with the other guilds motivated me. ”) 
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Ranking in r : : . 
. i Act 1 (achieved Act 2 (achieved Act 3 (achieved Act 4 (achieved 
Ranking Guild total (max i 3 K z F 3 : 3 
, points in act 1) points in act 2) points in act 3) points in act 4) 
points: 97) 
1 Nightmare on Elfstreet 80 5 12 21 42 
2 n-1 71 6 10 3 42 
3 Wachter des Wissens 66 5 18 4 39 
4 Nedlig 65 0 11 8 36 
4 Stardust 65 2 6 8 39 
6 Yojung 62 2 5 6 39 
7 Brave Vesperia 61 5 10 6 30 
8 Die drei dreisten zwei 59 0 0 21 38 
The amazing knights of 
9 58 3 8 8 39 
knowledge 
9 InfoPro 58 2 8 3 35 
11 Nordzyrea 53 3 12 8 30 
ii Super wilde Arbeits 53 0 6 3 34 
Gruppe [SWAG] i ` i 
11 5™* Row 53 4 12 9 28 
Guards of the 
14 , 51 0 8 3 30 
Information 
15 Die Knechte von Zyren 49 1 4 6 28 
16 Willi und die Majas 46 5 0 5 26 
16 Warmduscher 46 1 8 6 21 
18 Gilde Gildo 45 0 6 7 32 
19 Die Elfen Helfen 44 0 6 3 25 
19 Red Army 44 0 0 8 26 
21 Die Zyranier 42 0 8 4 30 
22 Guild Whores 5+1 37 0 0 6 21 
23 Heinrichs Armee 30 0 0 8 22 
24 Die Minions 22 0 0 8 14 


Table 2: Guild Ranking 


Table 2 illustrates the visualized ranking of the guilds, which was accessible via the platform. It shows the 
amount of XP that the guilds managed to achieve and furthermore depicts the correlation between the 
increasing distribution of XP and the rising difficulty level of the quests. The learning progress is directly 
mirrored in the amount of XP the guilds could achieve in various quests of a certain act. Members of a 
guild can directly trace in which act they achieved only few points and which parts of content they may 
have to repeat to gain a higher status in the ranking. 

As this visualization was also implemented to connect the report of the current state of knowledge 
to the aspect of competition this aspect was evaluated as well (Figure 4). This item was also analyzed 
because of the possible motivating function that the ranking had on the content mastery, as it provides 
information about possible deficits of the individual state of knowledge. 
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= totally disagree 
18,9% m disagree 

m agree 

m totally agree 


mno opinion 


n=90 
Figure 4: The Aspect of Competition Attached to the Ranking (“The visualized ranking raised my 
competitive spirit.”) 


66,6 % of the students perceived a notion competition with regard to the ranking. This indicates a clear 
motivating function of this form of visualized learning progress and a supporting function of competition in 
terms of knowledge acquisition and content mastery. 


AAA 


Personal Motivation Fun Subjective Learning 
Ambition (n=112) (n=111) Success 
(n=112) (n=111) 


Figure 5: Positive Effects of the Guild Quests 


Additionally the students were asked whether they experienced certain positive aspects in the sessions of 
the tutorial (Figure 5). The results were very positive as well, since 81% experienced the aspect 
“Motivation”, 92% the aspect “Fun” and 87% the aspect “Personal Ambition” as enforced by the guild 
quests of the tutorial. Another interrogated factor was the perception of a subjective learning success, which 
67% strongly confirmed. The positive outcome of this particular result indicates that the guild quests and 
their construction according to the game patterns support the subjective learning success as suggested by 
the theoretical model. 

To measure the concrete success of the whole concept besides the subjective perceptions of the 
students the new gamified concept was directly compared to the old concept (teacher-centered-teaching) in 
terms of the student’s grade in the final exam (Table 3). The amount of the grades “very good” and “good” 
achieved a higher percentage of 10,4 compared to the old concept, whereas the failure rate decreased with 
a percentage of 17, 5. The average grade in general of the new concept was 0,7 % better than the one of 
the old concept. As already indicated by the student’s perceptions these results support the usefulness of 
the gamified concept regarding learning success also from an objective perspective. 
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Passed , Passed with 7 i : 
Student Average P d ith Passed with Passed with Failed with 
asse 
sania grade É ee a „good“ r a an an 
Semester s x E (in „very „satisfactory a ; 
(n) (includin total) good” (1,0 (1,7 / 2,0 / » (2,7 / 3,0 / „adequate „inadequate 
fails , 2,3 : : 3,7 / 4,0 “ (5,0 
8 ) / 1,3) »3) 3,3) (3,7 / 4,0) (5,0) 
Summer 
semester 2012 84 3,54 55,95% 16,67% 11,90% 17,86% 9,52% 44,05% 
(old concept) 
Summer 
semester 2013 94 2,80 73,40% 27,66% 22,34% 18,09% 5,32% 26,60% 
(new concept) 
; ++17,45 
Difference - +0,74 % +10,99% +10,44% +0,23% -4,2% -17,45% 
‘0 


Table 3: Results of the Final Exam (Summer 2012 and Summer 2013) 


5 Discussion 


There have been many negative opinions on gamification in the past claiming it to be a primitive marketing 
concept derived from bored business consultancies to increase their sales (e.g. Bogost, 2011; Robertson, 
2011). 

However as the results of the illustrated evaluation confirm, gamification is more than a marketing 
concept. Its positive effect on the non-game context of higher education is indisputably traceable. The 
implementation of game elements and game patterns enforces many positive aspects regarding the learning 
process. As suggested by the theoretical model the collaborative pattern drives positive interdependences 
and results in a positive influence on the student’s behavior regarding the aspect of collective knowledge 
sharing and expanding. Apart from that the competitive pattern clearly increases motivation and 
engagement which shows that all aspects of the pattern are successfully implemented in the guild quests. 
In combination with an innovative and dynamic way of knowledge acquisition and content mastery and 
consolidation via cyclic development of expertise the presented game patterns had a traceable positive 
influence on the learning success of the students. This initial assumption is clearly confirmed by the positive 
outcome of the evaluation regarding the various factors from a subjective perspective of a student as well 
as from a clearly statistical comparison of the different concepts. Therefore it can be said that the guild 
quests represent an effective form of the implementation of game elements in the academic context and 
form an ideal supplement to the other parts of the project The Legend of Zyren. 
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Abstract 

Fostering intellectual diversity in the iSchools is a critical task and central to the unique iSchool vision. 
However, beyond recent efforts to track hiring patterns and figure out the representation of various 
disciplines within the iSchool community, there is currently a lack of empirical research about cross- 
disciplinary activity within iSchool faculties. In this research note, which seeks to build on and complicate 
a recent paper by Wiggins and Sawyer (2012), we foreground the various zones and activities that make 
up everyday iSchool life instead of discussing the iSchool as a coherent unit. Specifically, we examine 
faculty involvement with the dissertation production process as a potentially key zone of cross-disciplinary 
faculty contact and exchange. We also explore the use of “acknowledgement analysis,” a relatively 
unexplored method for studying academic social networks. Our findings, based on analyzing the 
acknowledgements of every dissertation published in 2010 (N=78) by a sample of 15 research-intensive 
iSchools, suggest that the dissertation production process is a site of cross-disciplinary activity but not 
evenly so across the various disciplines populating the iSchools. Some discipline areas within the iSchools 
engage in cross-disciplinary exchange more frequently than others and with a more diverse array of 
intellectual interlocutors. 
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1 Introduction 


This research note concerns the intellectual diversity of the iSchools. It seeks to build on, and productively 
complicate, an important study recently published by Wiggins and Sawyer (2012) in which the authors 
approach the iSchools as “a naturally occurring experiment in the creation of interdisciplinary academic 
units” (p. 20) and develop a classification system for measuring intellectual diversity within individual 
iSchools, within clusters of iSchools, and across the entire iSchool community. As Timothy Mitchell (2005, 
p. 316) has pointed out, the concept of “a natural experiment” can be deceptive because such phenomena 
are not typically one large experiment unfolding but, on the contrary, many related experiments coalescing 
into what comes to appear singular and “natural.” Mitchell’s observations about how natural experiments 
typically work is key to understanding the research presented in this note. 

As has been well documented in the small but existing literature on them, many of the iSchools are 
amalgamations of older disciplines, departments, schools, and research fields (Olson & Grudin, 2009; 
Bonnici, Subramaniam, & Burnett, 2009). iSchools are also “moving objects” that continue to innovate and 
self-adapt in real time. Responding to the lack of empirical data within the recent discussions of iSchool 
research cultures, Wiggins and Sawyer set out to measure and make sense of the iSchool community’s 
complex organizational and intellectual topography. Their research took place in 2009, when there were 32 
iSchools. Using a 21-school sample, Wiggins and Sawyer classified and grouped all (tenure-stream or 
tenured) iSchool faculty members based on the “discipline area” in which each faculty member had received 
their PhD (e.g., humanities, management, computing, education, et cetera). According to the authors, this 
approach was developed based on the notion that a PhD can be used “as a proxy for intellectual interests 
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and domain expertise” (p. 12) that may change over time, as individual researchers develop new areas of 
specialization, but never entirely subsides and thus has validity as a general marker of discipline-specific 
knowledge and orientation. After classifying every faculty member in their sample Wiggins and Sawyer were 
able to analyze individual iSchools based on the disciplines populating them, cluster the iSchools into various 
disciplinary leanings (e.g., sociotechnical, library, niche, etc.), and create a provocative snapshot of the 
larger intellectual currents that existed just before the period of tremendous globalization now underway 
that has new schools and researchers quickly entering the iSchool community. 

As Wiggins and Sawyer note, one limitation of their study was its reliance on secondary data 
collected from iSchool websites and from the Proquest UMI Dissertation Abstracts database. Nonetheless, 
their paper breaks new ground in modeling how we might subject the iSchools to the same kind of empirical 
analysis that we use when studying knowledge production networks in the sciences, social sciences, and 
humanities. Building on this idea and seeking to further the line of inquiry, we suggest that a second possible 
limitation within their study is a subtle but underlying assumption that the co-location of researchers from 
distinct intellectual traditions is a robust indicator of intellectual diversity and cross-disciplinary activity. 
The research presented in this note takes as its starting point the possibility that some iSchool faculty 
researchers from distinct research fields and disciplines may have little to no engagement with each other 
beyond the physical proximity of working together in the same school facilities. Put simply, while mapping 
person-to-person proximity can reveal hiring patterns and discipline-specific representation across the 
iSchool community, important things to know, something not quite captured by that approach is the 
character or volume of whatever intellectual intermixing is (or is not) happening within any given iSchool. 

Following Mitchell’s point cited above, we suggest that iSchools are in fact comprised of many small 
experiments in cross-disciplinary activity and that these experiments take place within everyday “contact 
zones” of faculty intermingling and collision, to borrow a term from Mary Louise Pratt (1998, p. 34). The 
advantage of this framework over an approach that foregrounds physical co-location, we argue, is that it 
re-scales intellectual diversity into doable units of activity that are open to small-scale interventions and 
tinkerings, as well as open to empirical study on a zone-by-zone or experiment-by-experiment basis. Some 
of the likely cross-disciplinary faculty contact zones within an iSchool might include: hiring committees, 
curriculum committees, co-PI projects, co-teaching arrangements, special school-wide initiatives, 
administrative and governance bodies, special research projects, and more. We suggest such zones function 
like hot spots on a climate map, as discrete sites of exposure in which faculty researchers encounter terms, 
ideas, and approaches from other disciplines; these zones are where intellectual cross-pollination does or 
does not happen on a quotidian basis within the iSchools. Of course, not all such zones of are open to 
empirical study and investigation by outsiders. Some involve closed or confidential discussions, or they fail 
to generate records and data for analysis. 

This research note presents some of our early findings about the iSchool dissertation process as a 
potentially key contact zone that facilitates cross-disciplinary exchange and activity among iSchool faculty 
members. We chose to begin our investigations of intellectual diversity with the dissertation process because 
the dissertation process often, but not always, results in a published knowledge product, the dissertation, 
and that knowledge product typically contains some kind of record, usually in the form of an 
acknowledgements section, that identifies the various faculty researchers involved in the dissertation process 
regardless of their disciplinary background or positioning. This research note also has a secondary “proof of 
concept” agenda in that we discuss a relatively new and novel method called “acknowledgement analysis.” 
The two research questions guiding this work are the following: Is the dissertation process a site of cross- 
disciplinary faculty contact and activity within the iSchools? Can “acknowledgement analysis” be used to 


map or model some of that contact and activity? 
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2 Using Acknowledgement Analysis to Study the Dissertation Process as a Faculty Contact 
Zone 


As indicated by its name, acknowledgement analysis draws obvious inspiration from the well-known method 
of citation analysis. It differs from citation analysis in that it privileges a published work’s acknowledgements 
section as a unique and revealing data source with the potential to reveal new types of information about 
the academic production process. Similar to a citation, an acknowledgement serves as a reward that can 
impact a faculty member’s career advancements (Sonnenwald, 2009). But citations and acknowledgements 
differ in that an acknowledgement is an (intentional) reward deemed valuable by an author yet 
acknowledgement “counts” have not (yet) emerged as an accepted tool for calculating someone’s influence 
and impact on a research field or discipline. Put simply, citations are commonly counted and have value 
but acknowledgements are not commonly counted and do not have the same value within the wider research 
community. Moreover, acknowledgement analysis remains relatively unexplored as a method. To date, the 
small amount of published literature using acknowledgement analysis has focused largely on identifying 
funders and funding patterns (e.g. Wang & Shapira, 2011). 

Our research considerably expands the use of acknowledgement analysis by focusing on the social 
networks underpinning the academic knowledge production process. However, what makes 
acknowledgements particularly rich as a data source is that acknowledgements often traffic in a relatively 
“flat ontology” (DeLanda, 2005). For example, the acknowledgement section of a single dissertation might 
mention or thank: senior researchers, junior researchers, external committee members, parents, siblings, 
roommates, friends, peers, co-workers, lovers, ex-lovers, program administrators, lab managers, librarians, 
archivists, dogs, cats, grant managers, coffee shops, coffee shop workers, coffee makers, Vietnamese 
restaurants, Indian restaurants, research subjects, local bars, a bicycle, a laptop, an unborn niece, an entire 
graduate faculty, computer equipment, government officials, tech support, grandparents, postdoctoral 
researchers, funding agencies, information visualization experts, duplication shop managers, fellowship 
administrators, conference attendees, a curator, the ocean, a therapist, and more. According to 
acknowledgements sections, a dissertation often involves an intricate web of relationships that are far more 
complex than the term “academic social network” typically suggests. 

Despite these complexities, the research for this current study focused exclusively on human actors 
mentioned in iSchool dissertation acknowledgements within a one-year time frame, 2010. For convenience 
sake, and because we are presently more concerned with exploring acknowledgement analysis as a method 
than making final, defensible claims about the iSchools overall, we did not survey the entire iSchool caucus 
but instead chose the 15 schools or departments currently listed on the iSchools Directory 
(http: //ischools.org/directory/) that conferred the most doctoral degrees between 1930 and 2007 (Sugimoto, 
Russell & Grant, 2009, Table 3). We performed a content analysis of iSchool dissertation acknowledgements 
and produced a social network analysis of the faculty members mentioned within every iSchool dissertation 
emerging from one of the iSchools in our sample. In order to collect dissertation acknowledgments in the 
sampled iSchools, we first utilized the MPACT database (http://www.ibiblio.org/mpact/) and obtained 
directory information including dissertation titles, schools, years, and author names. We then used author 
names and institutions as keywords to search the ProQuest Dissertations & Theses Database. Table 1 shows 
the number of dissertations in each sampled school in 2010, along with the actual number of 
acknowledgements we were able to access. 


Sampled dissertation Acknowledgement found % 
UNC 14 13 93 
Pittsburgh 13 6 46 
FSU 7 6 86 
Rutgers 7 5 71 


702 


iConference 2014 Brian Beaton et al. 


UIUC th ch 100 
UNT 7 7 100 
Toronto 6 4 67 
Michigan 5 4 80 
Indiana 4 2 50 
Syracuse 3 2 67 
UCLA 3 3 100 
Drexel 1 1 100 
Maryland 1 1 100 
UT-Austin 0 0 -- 
UW-Madison 0 0 -- 
Total 78 61 78 


Table 1: The Number of Sampled Acknowledgement across iSchools in 2010 


Next, using the “discipline area” classification system developed by Wiggins and Sawyer (2012, p.11), we 
coded the dissertation authors and all of the faculty names appearing within the acknowledgements sections. 
We found 269 names classifiable as faculty, 16 funding sources, and 782 people in the personal category 
within 61 cases. Names that related to an academic institution in a staff capacity such as librarians, lab 
technicians, etc. were considered as personal acknowledgements and not included within our current study 
(see Table 2 for examples). Most information about acknowledged faculty was gleaned from CVs available 
from either departmental or personal web sites. Where this was not possible, alternate online profiles (e.g., 
LinkedIn) served as the source for information related to degree and institution. Only 4 instances occurred 
where no information could be located. 


4 eet Name Relationship (Type of 
or acknowledged assistance) 


I would like to express my deepest gratitude to my 


advisor, Dr. Joseph Kabara, who constantly Dr. Joseph . 
1 ae academic 
inspires, encourages, and guides me through the Kabara 


problems in my research. 
2 I would like to gratefully acknowledge Charles Charles Lowry funding 
Lowry, Director of Sales & Marketing - Incisive 
Legal Intelligence, a division of Incisive Media Inc., 
and Rob Calcagni, VP Client Solutions - Outsell, ; i 
3 i A A Rob Calcagni funding 
Inc. for granting access to some of their firms’ 
research studies which greatly influenced the 
quality of this dissertation. 
..Dan always has the courage to tell me what I 
need to hear, even when I don’t want to hear it. 
4 I’m thankful for his patience while I finished this Dan personal 
document, and I’m excited to face life’s next 
challenges with him. 


Table 2: Coding Examples 
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3 Problems and Challenges with Using Acknowledgement Analysis 


Various problems and challenges arose during our research process that shed light on the promise of 
acknowledgement analysis as a method, and on its potential limitations or pitfalls. One challenge that we 
encountered is that some dissertations contained no acknowledgements. This prompted secondary reviews 
of all instances where no acknowledgement had been found. No errors were discovered. A second challenge 
that we encountered is that naming conventions in dissertations are not universal. While most dissertations 
contained explicit “Dedications” and “Acknowledgements” sections, some authors employed alternate 
headings for their acknowledgements section such as “Preface.” In terms of future research attempting to 
make use of acknowledgement analysis, this small but meaningful variation in naming conventions is likely 
to be found in other academic knowledge products that engage in acknowledgement activities, such as books 
and journal articles. Moreover, there exists the possibility that acknowledgement culture has changed over 
time and varies across academic formats. For example, some emergent forms of (digital) scholarly practice 
may not contain acknowledgements or their equivalent. Whether and how acknowledgement practices vary 
across formats or are changing along with recent developments in scholarly communications is a topic 
outside the scope of this paper but likely represents a research gap and opportunity ripe for future study. 


4 Findings, Discussion and Next Steps 


Our early and partial results suggest that the iSchool dissertation process is indeed a faculty “contact zone” 
within the iSchools, and that acknowledgement analysis can be used to map or model some of that cross- 
disciplinary contact and activity. For example, in 2010, there was a particularly traceable intermixing 
between “information” faculty and “library” faculty via the dissertation production process. Table 3 displays 
the distribution between disciplines as related to information authors and library authors, respectively. 
Perhaps unsurprisingly, no two iSchool discipline categories came into as much contact via the dissertation 
contact zone as “library” and “information.” In the 380 pairs of interaction, 38.9% of them were found 
within these two disciplines, including same-discipline combinations (i.e. information-information, N=53 
and library-library, N=34) and cross-discipline combinations (information-library, N=29 and library- 
information, N=32). 


S 
etl Source Target Occ % ak ae Target Occ % 
l (Author) (Scholar) ur. : i ) ae (Scholar) ur. i 
1 Information Information 53 20.7 1 Library Library 34 27.4 
Sci & 
2 Information CSEE ; 38 14.8 2 Library Information 32 25.8 
Engineering 
: Social & . ; 
3 Information : 29 11.3 3 Library Computing 14 11.3 
Behavioral 
3 Information Library 29 11.3 4 Library Education 12 9.7 
5 Information Computing 27 10.5 5 Library | Humanities 9 7.3 
M t ial 
6 Information peter & 26 10.2 6 Library pocie i 8 6.5 
Policy Behavioral 
T Information Humanities 20 7.8 7 Library | Communication 6 4.8 
8 Information Communication 17 6.6 8 Library SA : 5 4.0 
Engineering 
M t & 
8 Information Education 17 6.6 9 Library ania 4 3.2 
Policy 
Total 256 100 Total 124 100 


Table 3: Number of pairs of discipline interactions between authors and scholars (N=380) 
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Something unanticipated within our results, however, is a further suggestion that cross-disciplinary 
engagement, at least within the contact zone of dissertations, is not occurring evenly. Figure 1 illustrates 
the divisions of the acknowledged faculty across the disciplinary classification scheme. Patterns of contact 
differ between the library authors (in the blue area) and the information authors (in the tan area), 
respectively. This shows that dissertations that involve students who can be classified as falling within the 
“information” discipline (N=39) using the Wiggins-Sawyer scheme not only link with faculty from a larger 
number of other disciplines (e.g., science & engineering, social & behavioral, management & policy, etc.) 
than “library” authors (N=22) but that such cross-disciplinary contacts were also more frequent. In other 
words, dissertations involving an information dissertator are more likely to become, and more frequently, 
cross-disciplinary exercises in new knowledge creation that bring faculty from distinct disciplines into 
synchronous or asynchronous contact with one another. Because our research did not limit itself to formal 
dissertation committee membership but included any faculty member listed within a dissertation’s 
acknowledgements, this result is unlikely to be an effect of local rules and policies about dissertation 
committee composition. 


Communication 
30% 


Social & Behavioral 2 Computing 
20% 


15% 


10% 


Science & Engineering M / f Education Information Authors 


™ Library Authors 


Management & Policy ' Humanities 


Library Information 


Figure 1: Distribution of pairs of disciplines, grouped by information authors and library authors 
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Figure 2: Distribution of pairs of disciplines, grouped by information authors and library authors (Alternate 
rendering of Figure 1) 


There are several possible directions for future research. Widening our sample to include all of the iSchools 
would allow us to make more comprehensive claims about the iSchool community and the role of the 
dissertation process as a cross-disciplinary contact zone. Expanding the chronological scope of our research 
to include past years would allow for longitudinal analysis. We could begin with targeted intervals (e.g., 
1990, 1995, 2000, 2005) to map broader cross-disciplinary trends within the dissertation contact zone 
followed by a year-by-year analysis that reveals change over time in a more refined fashion. Finally, the 
dissertation process contact zone could be compared to some of the other likely cross-disciplinary faculty 
contact zones listed above (e.g., hiring committees, governance bodies, etc.) to begin cross-zone comparison 
and identify which zones are the liveliest and most successful intellectual diversity hotspots at the iSchools. 

Although we concede, in theory, that the mere presence of faculty from different disciplines affords 
greater opportunities for cross-disciplinary collaborations, we echo the call made by Wiggins and Sawyer 
that more empirical research is needed to make sense of intellectual diversity within the iSchools. De- 
privileging physical co-location, this note approached the iSchools as institutions comprised of many small 
experiments in cross-disciplinary activity— everyday “contact zones” in which faculty intermix 
intellectually. This note, which focused on the dissertation production process as one such contact zone, 
also had a secondary “proof of concept” agenda pertaining to the use of “acknowledgement analysis” as a 
method. The preliminary findings presented here suggest that dissertations do bring faculty from different 
“discipline areas” into synchronous or asynchronous contact with one another, and that acknowledgement 
analysis can reveal some of those cross-disciplinary intermixings. However, such intermixings are not 
happening evenly across the discipline areas. For example, in 2010, the dissertation production process in 
the discipline area of “library” was intellectually diverse with less frequency and with a narrower range of 
pairing disciplines than the dissertation production process in “information.” 
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Abstract 

The Boston Marathon bombing event presents a rare opportunity to study how a massive disruptive 
event triggers emotional contagion. In this work, we use over 180 million geocoded tweets over an entire 
month to study how Twitter users expressed shared fear, comfort and community identity, over time 
and across different cities following the bombings. We quantify the level of shared fear by using the 
sentiment and time-series analyses. The expressions of comfort and community identity are studied based 
on the emergent use of two hashtags widely adopted after the bombings: #prayforboston and 
#bostonstrong. We found that these emotional responses varied with their geographical distances from 
the Boston area. However, statistical analyses show that users’ direct experience of being in Boston 
predicts the shared fear better, and users’ social networks are more effective in predicting the occurrences 
of expressing comfort and community identity. Our study has implication in identifying potentially 
vulnerable population, and predicting the perceived threat in the face of future massive disruptive events 
such as terrorist attacks. 


Keywords: emergency response, collective activity, crisis management, emergent use of communication tools, social media and 
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1 Introduction 


The bombing at the Boston Marathon on April 15, 2013 resulted in 3 deaths and more than 250 casualties. 
Over the subsequent week, the search for and apprehension of the suspects resulted in an area-wide manhunt 
and “lockdown” of Boston and neighboring suburbs. 

Local and national news outlets continuously reported on the tragedy and the on-going threat. 
Intensive social media discussion were triggered by news reporting and by people witnessed or participated 
in the event in all sorts of ways. Over the first week, Boston area residents were immersed in the stories 
and aftermath of the bombings. 

The news and social media response was soon intermingled with prideful news commentary about 
the heroic responses of Bostonians. Boston-area residents continue to be reminded of the event by media 
reports of the alleged bomber who was tried in a Boston court and “Boston Strong” community events such 
as the “Run to Remember” memorial foot-race and music concerts. Outside Boston, people showed support 
for Boston and the victims of the bomb attack. In New York, Yankees fans stood in the baseball stadium 
singing “Sweet Caroline,” the Boston Red Sox anthem. In Chicago, more than 200 runners gathered for a 
run of solidarity. A week after the Boston Marathon, thousands of marathon runners in London wore a 
black ribbon in solidarity with the people of Boston. People around the country and over the world expressed 
concerns and comfort through Facebook and Twitter. 

The marathon bombing presents a rare opportunity to examine how a serious, real-life, community- 
wide threat stirs up a shared perception of risk, a sense of empathy, as well as a sense of togetherness, 
solidarity or community identity. 

In this work, we use Twitter communications related to the Boston bombings to study the extent 


to which people share fear, express comfort and community identity with the affected population during a 
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massive destructive event. Social media sites like Twitter and Facebook has shown profound increases in 
traffic and information sharing during major events. The widespread use and semi-transparency of social 
media, Twitter in particular, makes people’s public expressions more available to analysis than was 
conceivable a few years ago. On one hand, Tweet communication streams integrate the representative scope 
of polls and surveys with the free-form responses of focus groups and interviews; on the other hand, Twitter 
users are embedded within their usual social contexts rather than artificial contexts created by polls, focus 
groups, and other survey methods. The scope and sensitivity of Twitter has thus become an attractive 
means of measuring and assessing the responses of the public to events and information (Lin, Margolin, 
Keegan, & Lazer, 2013). 

Twitter data have been mined in real time for temporal cues during political events (O’Connor, 
Balasubramanyan, Routledge, & Smith, 2010; Lin, Margolin, Keegan, Baronchelli, & Lazer, 2013), economic 
events (Bollen, Mao, & Zeng, 2011; O’Connor et al., 2010), and sports events (Nichols, Mahmud, & Drews, 
2012). In particular, the massive outpouring of political communication on social media sites permits 
analysis of sentiment and topics during political events such as elections (O’Connor et al., 2010; Tumasjan, 
Sprenger, Sandner, & Welpe, 2010) and debates (Lin, Margolin, Keegan, & Lazer, 2013; Metaxas & 
Mustafaraj, 2012; Diakopoulos & Shamma, 2010). Twitter data have also been utilized as early detection 
systems for emerging public health problems (Aramaki, Maskawa, & Morita, 2011; Chew & Eysenbach, 
2010; de Quincey & Kostkova, 2010). There has been considerable effort leveraging Twitter data for real- 
time emergency detection (Sakaki, Okazaki, & Matsuo, 2010; Guy, Earle, Ostrum, Gruchalla, & Horvath, 
2010; Earle, Bowden, & Guy, 2012) and crisis management (Caragea et al., 2011; Li & Rao, 2010; Mendoza, 
Poblete, & Castillo, 2010). The use of Twitter during disaster events have been examined (Hughes & Palen, 
2009; J. Sutton, Palen, & Shklovski, 2008; J. N. Sutton, 2010), and studies have shown that Twitter supports 
backchannel communication to address the information dearth problem in the face of disaster, though the 
spread of misinformation is also a concern (J. Sutton et al., 2008; J. N. Sutton, 2010). Recent study 
suggested social media may also offer potential psychological benefit for affected populations through 
participating in social media conversation (Keim, 2011). 

In the context of an emergency or disaster, disruptive events refer to events which interrupt the 
normal functions of a community or business, and may result in harm (McAslan, 2011). In a disruptive 
event, a layperson’s assessment of potential harm (i.e., perceived risk) is as important as the actual 
magnitude of harm assessed by experts, because people respond to their perceived risk, rather than the 
actual likelihood and severity of harm (McAslan, 2011), and their judgments can be biased by the perceived 
threat (Baumann & DeSteno, 2010). Using the case of Hurricane Katrina, Comfort showed how policy 
makers failed to communicate the urgency of the danger to their respective agencies without recognizing 
the severity of the threat and its likely consequences (Comfort & Haase, 2006). 

The consequence following extraordinarily upsetting events have profound impact on people with 
direct or indirect exposure to the events and could affect how people function over time (Maguen, Papa, & 
Litz, 2008). For example, after the events of 9/11, although individuals may not have PTSD (post-traumatic 
stress disorder) or depression, they may ride the bus or fly less frequently or reduce social interactions with 
other people in public. The immediate and consequential psychosocial vulnerability may be overcome by 
community resilience, which has been characterized by prior research as a typical collective behavior, such 
as the emergent togetherness, solidarity, unity or “community spirit” observed in the 2005 London bombings 
(Drury, Cocking, & Reicher, 2009). Community resilience, including comforting and helping each other, is 
the ability of a human system to absorb disturbance and still retain its basic function and structure (Drury 
et al., 2009; McAslan, 2011). 

This work fills an important gap in prior research on the understanding of shared sense after a 
massive disruptive event. A massive disruptive event often has far-reaching impact on large population, 
including victims, witnesses, and people not directly affected by the event. Most prior work focuses on the 
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psychological impact on the directly affected population or the communication within the local community. 
(Drury et al., 2009; J. Sutton et al., 2008; J. N. Sutton, 2010). In this work, we attempt to characterize the 
Twitter communication among and outside local community in response to the Boston bombing event. We 
examine the two key components, perceived threat and community resilience, within and beyond the local 
community in Boston during this massive disruptive event. Using Twitter communication streams, we study 
people’s perceived threat based on the detected fear expression in their public tweet messages, and we study 
community resilience based on the emergent use of words signaling comfort and community identity in their 
tweets. We believe the study will contribute to an understanding of how shared perception of risk and 
community resilience may be developed for potentially vulnerable population. 
We focus on three research questions: 


e To what extent Twitter users express fear that shares common characteristics with the affected 
population? How can we characterize the level of shared fear? 

e To what extent Twitter users express comfort and community identity? 

e What factors may explain different level of shared fear, and different level of interests in expressing 
comfort and community identity? 


We discuss our method and analysis results in the following sections. 
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Figure 1: Tweet daily volume in dataset. The plot shows the daily number of geocoded tweets (volume) 
in our dataset. Tweets without geocodes are removed. From top to bottom: the total daily volume, the 
daily volume of tweets within the Boston area, and the daily volume of tweets within the direct affected 
region (DAR). The vertical dashed lines indicate the day of Boston bombings (April 15; black line) and the 
day of manhunt (April 19; gray line). The tweets posted on April 1, 10 and half-day on April 11 were 
missing due to data collection process errors. 


2 Method and Results 


2.1 Data Description 


This project uses nearly all geotagged tweets collected from the Twitter Streaming API (Mostak & Lewis, 
2012). Fig. 1 shows the total number of geocoded tweets (volume) per day over the month of April in the 
dataset. Tweets without geocodes (latitude and logitude) are removed. The vertical dashed lines indicate 
the day of Boston bombings (April 15; black line) and the day of manhunt (April 19; gray line). Due to 
data collection process errors, tweets posted on April 1, 10 and half-day on April 11 were missing. In the 
following analysis, volumes in the three missing days are excluded when reporting aggregated statistics. The 
total volume before, during and after the bombing day are 6.59M tweets/day on average, 6.69M tweets, 
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and 6.36M tweets/day on average. Within the Boston area, the volume before, during and after the bombing 
day are 42370 tweets/day on average, 56131 tweets, and 45668 tweets/day on average. 

Following the news reports (What we know about the Boston bombing and its aftermath, 2013), we 
manually identified direct affected region that covers the area of the two blasts. We refer to this 
approximately 0.55 km2 direct affected region as “DAR.” Within DAR, the volume before, during and after 
the bombing day are 234 tweets/day on average, 832 tweets, and 331 tweets/day on average. 


2.2 Detecting Fear 

To study users’ expression of fear, we incorporate sentiment analysis to extract different sentiments from 
the text in users’ tweet messages. We use a concept-based affective lexicon SentiSence (de Albornoz, Plaza, 
& Gervas, 2012) to extract two different kinds of sentiments, fear and joy. Examples of fear related keywords 
include ’fearful’, ’unkind’, ’craziness’, ’crime’, ’shudder’, ’suffocate’, ’dreadfully’, ’terror’, ‘fatal’, ’crash’, 
‘anxiously’, ’erupt’, etc. and joy related keywords include ’satisfied’, ’cheerful’, ’comfortableness’, ’cruise’, 
*pleased’, *happiness’, ’joyful’, *belonging’, ’exult’, ’rejoicing’, ’eagerly’, ‘fortunate’, etc. We compute the 
relative strength of a sentiment within a region as follows. Let L be the list of all words in the sentiment 
lexicon, and Lear and Ljoy be the lists of fear- and joy-related words, respectively. The degree of a kind of 


sentiment c E {fear,joy} in a tweet i, denoted as sic, is given by 
Sic = |W; N Lel / |W: N L|, 


where W; is the words in the text content of tweet i The sentiment index Spr. of a region R within a 


particular time interval T is given by 


SRT,c = : X (sie/ >. sie) , 


tiET,giER 


where t; and g; are the timestamp and geocode of tweet i, respectively, and m = |i: t; € T, gi € R|. Based 
on the above calculation, the fear index (or joy index) is a normalized measure of the relative strength of 
fear (or joy) regardless of number of tweets posted within a region and a time interval. 

Fig. 2 show the hourly fear and joy indices within DAR and the Boston City, from April 10 to 30. 
Compared with joy indices, the fear indices exhibit greater sudden increases around April 15 and April 19, 
corresponding to the times of blasts and the subsequent manhunt. The substantial difference between the 
two indices suggest a particular emotional expression, fear, was trigger in response to the event. 
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Figure 2: Sentiment indices over time in (a) the direct affected region (DAR) and (b) the City 
of Boston. The hourly sentiment indices of fear and joy before, during and after the Boston bombings are 
shown in smooth fitted curves over time (in UTC) with shaded area indicating a 95% confidence region. 
The vertical dashed lines around April 15 and 19 indicate the times of bombings and manhunt, respectively. 
The spikes in the level of fear correspond to the Boston bombings and the subsequent events, while the 
level of joy does not exhibit a sudden increase and is relatively stable over the period. 


In Fig. 3 we show the sentiment indices in Boston and other three cities, New York City, Washington and 
Chicago, over the same time period. We observe similar, though slightly weaker, spike patterns in all three 
cities. In these cities, the first peaks on April 15 have a six- to eight-hour delay compared with the first 
peak in Boston. The highest fear level in Boston is at least 1.5 times the fear level in other cities. This 
indicates the local community in Boston had stronger and quicker emotional response. 

We observe a small increase of joy in DAR on April 22, which correspond to the day when the 
suspect, Dzhokhar Tsarnaev, was charged (Markon, Horwitz, & Johnson, 2013). The increase of joy is not 
obvious within the Boston City and other major cities. This may suggest that the local community from 
the directly affected region retained higher attention on the bombing related events. 

To understand the extent to which a city shares the fear in response to the event, we compute the 
fear correlation between Boston and the given city in terms of the correlation between the two cities’ 
timeseries of fear indices. Fig. 4(a) shows the fear correlation between the Boston City and other major 
cities. The cities are ordered from top to bottom based on the correlation values. We can see some non-US 
cities, including London, Paris and Moscow, exhibit higher level of fear correlation than many US cities, 
suggesting the level shared fear may vary depending on various social, economic and political connections 
among the cities. 

In Fig. 5(a) we plot the cities’ fear correlation with Boston against their geographical distance from 
Boston. The decline of correlation along the distance suggests a ripple of shared fear may vary with the 
geographical proximity between cities. 
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Figure 3: Sentiment indices over time in (a) Boston, compared with those in (b) New York 
City, (c) Washington and (d) Chicago. Similar but slightly weaker spike patterns were observed in 
New York City, Washington and Chicago over the same time (in UTC). 
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Figure 4: Expression of fear, comfort and community identity in response of the Boston 
bombings. (a) The temporal correlation of detected fear in a city with respect to the fear in Boston. Non- 
US cities are colored in dark gray. (b) The total number of tweets (volume) containing the #prayfor- 
boston hashtag in a city. (c) The total number of tweets (volume) containing the #bostonstrong hashtag 
in a city. In all three plots, the cities are ordered based on the variable of interest (x-value), from the 
highest to the least 


2.3 


In Twitter, hashtags are ubiquitous and flexible annotations, allowing users to track ongoing conversations, 


Comfort and Community Identity 


signal membership in a community, or communicate non-verbal cues like joy and sadness. Hashtags often 
reflect eccentric topics and their emergence is happenstance. Over the two weeks of Boston Marathon, 
bombings and subsequent events, we observe new hashtags such as #bostonmarathon, #prayforboston, 
#bostonstrong were created and quickly adopted by many users in their tweet conversations. These 
hashtags serve different conversational purposes. For example, #bostonmarathon is a topical hashtag, most 
popular on April 15 and was used mainly in Boston area to indicate any Boston-Marathon-related 
conversations. 

We focus on the emergent use of hashtags #prayforboston and #bostonstrong. The two hashtags 
were widely adopted after the bombings. The first hashtag #prayforboston became popular immediately 
after the bombings, used by both Boston and non-Boston users, to send comfort messages to Bostonians. 
The second hashtag #bostonstrong was populated two days after and gained its highest popularity around 
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April 20 due to the “Boston Strong” community events. This hashtag reflects a sense of community identity 
of Bostonians. 

Interestingly, the two hashtags also appeared widely in tweets from other cities. Fig 4(b,c) show 
the number of tweets containing the two hashtags. Outside Boston, the level of interests for Twitter users 
in expressing comfort (in terms of #prayforboston volume) and community identity (in terms of 
#bostonstrong volume) vary with cities. The hashtag #prayforboston had a wider reach than the hashtag 
##bostonstrong — among the 25 cities considered in this analysis, #prayforboston appeared in 23 cities, while 
##bostonstrong only appeared in 17 cities. Similar to the detected fear, the use of the two hashtags exhibit 
a ripple effect corresponding to the geographical proximity of cities, as shown in Fig. 5(d,g). 


2.4 Social Networks and Personal Visits 


To further understand the ripple effect of expressing fear, comfort and community identity, we study these 
expressions in relation to other social factors. We identify two social factors: 


e Social tie is the strength of social connections between Boston and a given city. We quantify the 
social tie strength between two cities A and B based on the number of replies sent within 
approximately two weeks before the event (from April 2 to 14), with a condition that the reply 
sender and receiver were observed in cities A and B or B and A on the same day of the reply. A 
user can be observed in a city if the user posts a tweet with geocode within the region of the city. 

e Personal visit is the amount of travel users made between cities. We use personal visit between 
Boston and a given city to quantify the direct experience or actual familiarity of being in the Boston 
City. We first extract a transition flow for each individual user within the two-week pre-event period 
(from April 2 to 14). The transition flow is a temporal ordered list of cities where the given user 
was observed through geocoded tweets. The amount of travel between two cities A and B is then 
measured based on the number of transitions between A and B or B and A by aggregating all 
individual transition flows. In addition to the two social factors, we consider the following two 
control variables: 

e Geo-distance is the geographical distance between Boston and a given city, measured in kilometers. 

e Tweet activity is the expected Twitter activity of a city regardless of the event. This quantity serves 
as a baseline variable when explaining the level of response to the bombing events. We quantify this 
baseline tweet activity by the number of tweets posted from a city within the two-week pre-event 
period. 


In Fig. 5 we plot the fear correlation, the volume of #prayforboston and #bostonstrong against the geo- 
distance, social tie and personal visit between Boston and other cities. The fear correlation is computed 
based on fear indices between April 10 and 20. The volumes of hashtags are calculated as total number of 
tweets containing the hashtags posted between April 15 and 30. The first-order correlations are shown on 
top of each scatterplot. Among the three factors, personal visit has the highest association with shared fear 
and the volume of #bostonstrong, while social tie has the strongest relation with the volume of 
#prayforboston. The first order correlations suggest the strength of social tie may well predict people’s level 
of interest in expressing comfort during the event. On the other hand, the number of personal visit to the 
city may serve a strong predictor for the level of shared fear and level of interest in expressing community 
identity. We use multivariate linear regression analysis to examine the impact of different factors. We 
examine linear models for three response variables: the level of shared fear, the volume of #prayforboston 
(comfort) and the volume of #bostonstrong (community identity). Using different combination of 
predictors, we test the following models: 
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e baseline model: has a single predictor, tweet activity 

e geo model: has two predictors, tweet activity and geo-distance 

e social model: has two predictors, tweet activity and social tie 

e visit model: has two predictors, tweet activity and personal visit 

e  geo-social model: has three predictors, tweet activity, geo-distance and social tie 

e geo-visit model: has three predictors, tweet activity, geo-distance and personal visit 

e full model: has four predictors, tweet activity, geo-distance, social tie and personal visit 


We report the out-of-sample R° in Table 1. The results indicate that personal visit is a strong predictor for 
the level of shared fear between Boston and a given city. When predicting the shared fear, the visit model 
outperforms social and geo models by 22% and 104%, respectively. In predicting the interests of expressing 
comfort and community identity, both social tie and personal visit are strong predictors. The social and 
visit model outperform the geo model by at least 62%. Social tie has slightly higher predictive power than 
personal visit. The social model improves visit model by 5.8% in predicting the volume of #bostonstrong, 
and by 6.8% in predicting the volume of #prayforboston. Due to the small sample size (25 cities in total), 
the models with more predictors are likely overfitting. 


baseline geo social visit aan oud full 
social visit 
shared fear 0.087 0.127 0.213 0.260 0.160 0.189 0.133 


#prayforboston 0.090 N/A 0.512 0.479 0.191 0.176 0.120 
#bostonstrong 0.335 0.340 0.577 0.545 0.104 0.398 0.259 


Table 1: Prediction performance. Out-of-sample R° for models of predicting shared fear, volume of 
#prayforboston (comfort), volume of #bostonstrong (community identity). 


3 Discussion and Future Work 


In this work, we characterize the Twitter communication among the local and global communities during 
and after a massive disruptive event — the Boston bombings. Drawing from the crisis management literature, 
we identified two key components in a massive disruptive event: the perceived threat and community 
resilience. Using about 180 million, nearly all geotagged tweets, we study Twitter users’ perceived threat in 
terms of their fear expression in tweets and compute the temporal correlation between Boston and other 
cities. We study community resilience based on the emergent use of two hashtags widely adopted after the 
bombings: #prayforboston signaling comfort, and #bostonstrong signaling solidarity and community 
identity. We observed that the level of shared fear, the interests of expressing comfort and community 
identity vary with the geographical proximity between cities. Using correlation and linear regression 
analyses, we found that users’ direct experience of being in the city of Boston, quantified in terms of the 
amount of travel to Boston, predicts the shared fear better than other factors. In terms of predicting the 
interests of expressing comfort and community identity, both social network, measured based on replied 
tweets, and direct experience, outperform the geographical proximity by at least 62%. Our analyses has 
implication in identifying potentially vulnerable population, and predicting the perceived threat in the face 
of future massive disruptive events such as terrorist attacks. 

Our current work presents several limitations. First, we rely on a concept-based affective lexicon to 
extract fear expression. The lexicon cannot capture fear signals in non-English words or in more dynamic 
forms (such as emoticons, acronyms and community invented expressions). Second, we focused on 25 major 
US and non-US cities. Our results may not generalize to the majority of smaller cities or neighboring cities. 
As part of future work, we plan to (1) develop a more sophisticated fear detection framework for extracting 
shared fears, (2) characterize the shared fears at multi-scale of time and space, (3) extend our analyses to 
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a more comprehensive sample of large, medium and small US cities, and incorporate the social, political 
and economic attributes of these cities in the analyses, (4) use human-coded content analysis to analyze the 
communication patterns in a smaller sample (e.g., focus on tweets posted in the direct affected region) to 
triangulate the analyses from big data sample, and (5) compare the Boston bombing event with other types 


massive disruptive events such as hurricanes and school shootings. 
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Figure 5: The expression of fear, comfort and community identity in relation to geo-distance, 
social ties and personal visits. (a-c) are scatterplots of the fear correlation with Boston against distance 
in kilometers, number to replies, and number of personal visit, respectively. (d-f) are scatterplots of 
#paryforboston volume (total number of tweets containing the hashtag) against the three factors. (g-i) 
are scatterplots of #bostonstrong against the three factors. In all plots, non-US and US cities are colored 
in red and blue, respectively. The correlation coefficients are reported on top of the scatterplots except 


for (d,g) which have influence of outliers. 
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5 Table of Figures 


Figure 1: Tweet daily volume in dataset. The plot shows the daily number of geocoded tweets (volume) 
in our dataset. Tweets without geocodes are removed. From top to bottom: the total daily volume, the 
daily volume of tweets within the Boston area, and the daily volume of tweets within the direct affected 
region (DAR). The vertical dashed lines indicate the day of Boston bombings (April 15; black line) and the 
day of manhunt (April 19; gray line). The tweets posted on April 1, 10 and half-day on April 11 were 
missing due to data collection process errors. .........cscceccececeeeeeessnneeececeeeeeessaeeeeeceeeseessaeeeecceseesesaaeeeeeeneneaeaa 710 
Figure 2: Sentiment indices over time in (a) the direct affected region (DAR) and (b) the City 
of Boston. The hourly sentiment indices of fear and joy before, during and after the Boston bombings are 
shown in smooth fitted curves over time (in UTC) with shaded area indicating a 95% confidence region. 
The vertical dashed lines around April 15 and 19 indicate the times of bombings and manhunt, respectively. 
The spikes in the level of fear correspond to the Boston bombings and the subsequent events, while the 
level of joy does not exhibit a sudden increase and is relatively stable over the period. .................:eeeeeee 712 
Figure 3: Sentiment indices over time in (a) Boston, compared with those in (b) New York 
City, (c) Washington and (d) Chicago. Similar but slightly weaker spike patterns were observed in 
New York City, Washington and Chicago over the same time (in UTC). wo... eeeeseceeeeneecneeenneeneens 713 
Figure 4: Expression of fear, comfort and community identity in response of the Boston 
bombings. (a) The temporal correlation of detected fear in a city with respect to the fear in Boston. Non- 
US cities are colored in dark gray. (b) The total number of tweets (volume) containing the #prayfor- boston 
hashtag in a city. (c) The total number of tweets (volume) containing the #bostonstrong hashtag in a city. 
In all three plots, the cities are ordered based on the variable of interest (x-value), from the high- est to the 


Figure 5: The expression of fear, comfort and community identity in relation to geo-distance, 
social ties and personal visits. (a-c) are scatterplots of the fear correlation with Boston against dis- 
tance in kilometers, number to replies, and number of personal visit, respectively. (d-f) are scatterplots of 
#paryforboston volume (total number of tweets containing the hashtag) against the three factors. (g-i) are 
scatterplots of #bostonstrong against the three factors. In all plots, non-US and US cities are colored in red 
and blue, respectively. The correlation coefficients are reported on top of the scatterplots except for (d,g) 
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Abstract 

Aim 

The aim of this research is to focus on geography and its sub-disciplines with the intention of exploring 
how the nature of discipline shapes current and potential research data management practices and, in 
turn, how disciplines themselves are being reshaped by changes in data creation, use and management. 
Design 

The research is in two sequential phases. Phase 1 consists of a scoping stage which includes a web-based 
study using different techniques, such as link analysis and bibliometrics, and interviews with data 
management experts. Phase 2 is primarily based on a series of interviews with researchers, investigating 
researchers' research practices and attitudes of data management. 

Findings 

By the time of the conference preliminary results from phase 1 of the research will be available for 
reporting. 

Value 

This research will provide a better understanding of how the complexity of research data, and so the 
challenges for improving research data management, are grounded in the underlying nature of 
disciplines/sub-disciplines. As a by-product of this understanding, it will enhance the conceptual theory 
of disciplinarity through detailed analysis of changes around research data management in geography 
and its sub-disciplines. 
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1 Context 


Research data management has become an increasingly important topic in information science (Borgman, 
2012; Corrall, 2012). Data is a crucial part of research. Today many research councils and funders worldwide, 
such as the funding councils and the Wellcome Trust in the UK, the National Science Foundation and 
National Institutes of Health in the US, and the Australian Research Council require data to be managed 
according to best practices and standards. As a consequence, Higher Education Institutions and researchers 
are under pressure to manage their data better. In the UK, the British government has invested £10 million 
to create and support the Open Data Institute which aims to promote open data culture and drive 
innovation and growth in the UK. It is evident that policy-makers and research funders believe that effective 
data management is the foundation for good research and that opening up data will benefit society. In 
addition, a variety of benefits to researchers that arise from effective management of research data have 
been identified, e.g. verification of research, stimulating new collaborations, transferring knowledge to 
industry, sharing and reuse of data, reducing future preservation costs etc. (Fry et al, 2009; DCC, 2013). 
However, not all these benefits are relevant to all disciplines and all types of data. 

One of the key challenges in managing research data is the complexity and diverse nature of data, 
coupled with diverse data practices across disciplines. The types of data produced within disciplinary 
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communities can vary greatly, from static numeric and text-based data to dynamic multimedia data. 
Consequently, even the concept of data is difficult to define (Borgman, 2012). 

There are an increasing number of studies of research data management in the disciplinary context. 
For example, in the UK, the Research Information Network (RIN) and Digital Curation Centre (DCC) 
SCARP project have investigated disciplinary attitudes and approaches to data curation (Lyon et al, 2010; 
RIN, 2008). All projects suggest there are not only variations in kinds, file formats, value and long term 
viability of research data in different disciplines but also significant differences in data practices and culture 
of sharing data. For example, according to the RIN report (2008), re-use of data and data sharing is a norm 
in astronomy, while in climate science, the culture of data sharing and publication is not strong. It is evident 
that practices of data creation, use and sharing vary widely across subject disciplines and their sub- 
disciplines. However, these differences are actually shaped by something more fundamental, the nature and 
the complexity of disciplines. Thus, this research will attempt to provide insights into the process and the 
practices of knowledge communities and explore how these shape research data management practices. 

The nature and use of research data is strongly related to disciplines (Pryor, 2009; Cragin et al. 
2010). Effective support of research data management, therefore, is dependent on a detailed understanding 
of the characteristics of disciplines and data creation and use within disciplinary communities. Terms such 
as, ‘specialization’, ‘fragmentation’, ‘hybridity’ and ‘fluidity’ have been used to describe the dynamic nature 
of disciplinarity (Klein, 1996; Dogan & Pahre, 1990). As knowledge becomes increasingly interdisciplinary, 
some disciplines borrow concepts and methods from another discipline and become hybrid disciplines or 
sub-disciplines. Due to the dynamic nature of disciplines, data are highly specialized, fragmented and 
diverse. As knowledge production becomes more and more interdisciplinary in nature the hybridity and 
fluidity of disciplines also create complexity in relation to research data management. 

Disciplines provide basic structures for organising knowledge and research in Higher Education 
Institutions. Becher and Trowler (2001) developed a matrix of disciplinary groupings, namely hard-pure, 
soft-pure, hard-applied and soft-applied. 
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y Ce | 
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Hard-applied Soft-applied 
(Engineering) ( Social 

Sciences) 


ri, 


Figure 1: Becher's matrix of disciplinary cultures (1987) 


Becher and Trowler (2001) use the terms hard/ soft, pure/ applied to refer to intellectual differences in 
disciplines, e.g. research problems, research objects, and methods. They also use terms such as 
convergent/divergent, rural/urban to refer to differences in social structures, e.g. community culture, 
communication patterns, and reward systems. For instance, Becher and Trowler (2001) classify history as 
a soft-pure rural discipline, which typically uses interpretative methods and in which researchers tend to 
work independently; while high energy physics is classified as a hard-pure urban discipline, which typically 
favours quantitative methods, and large scale collaborations. It appears that history is more likely to 
generate small scale data, e.g. interview data, than the large computational datasets often generated in 


disciplines such as high energy physics. In general, computational datasets are usually highly structured 
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and anonymised, whereas interview data is more personalised and not easily anonymise (Borgman, 2012). 
Thus the character of disciplines is likely to affect approaches to managing data and attitudes towards data 
sharing. 

It seems likely, therefore, that the nature of disciplines, e.g. research methodologies, research 
problems, processes and practices within and across disciplines/ sub-disciplines, shape the nature of the 
data created, ways in which data are manipulated and stored, the possibility of data sharing and reuse, and 
what constitutes effective data management practices. It is essential, therefore, to understand how the 
nature of disciplines shapes research data management practices. In addition, having a deeper understanding 
of the changing nature of disciplines, may also reveal how the research data management agenda may bring 
potential changes to disciplines. This understanding is essential if appropriate measures can be developed 
to increase awareness, encourage best practices and further develop appropriate strategy, services and 
infrastructures for research data management. 

In order to develop an understanding from a disciplinary perspective, one of the common approaches 
is to compare across a number of disciplines. However, in this research, comparisons within a single discipline 
are given more weight. Studies of discipline at a broad level may falsely represent them as unified. As 
mentioned earlier, the number of studies of research data management in the disciplinary context has 
increased. Most of the studies that have been done are focused on the Sciences, such as, astronomy, systems 
biology, genomics, with a few on Arts, such as, classics or artistic research. It appears that relatively less 
research has been conducted on social sciences. Most of the results of these studies show that much diversity 
exists even within a single disciplines e.g. at the sub-discipline level (Key Perspectives, 2010; RIN, 2008). 
Thus, this research focuses on sub-disciplines in one single social science discipline, which is geography. 

Geography is a well-established discipline in universities. Today, geography departments can be 
found in most parts of the world, for example, the US, Australia, South America, China, Japan, India 
(International Geographical Union, 2013). Geography is chosen not only because there is limited research 
of research data management in this subject, with a few exceptions, e.g. there are some studies of 
awareness/activities of curating geospatial data (McGarva et al, 2009; Bose & Reitsma, 2005 ), but also 
because of its internal complexities. 

Geography has a clear division between the two main branches in geography: physical geography 
and human geography. Physical geography concerns the sciences of physical landscapes and environmental 
processes. It belongs to the ‘hard’ side of Becher’s (1987) matrix of disciplinary cultures (Fig. 1). In contrast, 
human geography belongs to the ‘soft’ side of Becher’s (1987) matrix. It concerns human activities on the 
Earth, including cultures, societies and economies from a spatial perception. Geography is a dynamic 
discipline. Both physical geography and human geography consist of a wide range of sub-disciplines, 
reflecting diverse research problems in the discipline (fig.2 & 3). In addition, Geography has had a close 
relation to other disciplines and has developed some well-established interdisciplines, such as quaternary 
science and geo-archaeology etc. 
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Figure 2: Sub-disciplines in physical geography (adapted from Matthews & Herbert, 2008) 
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Figure 3: Sub-disciplines in Human geography 


Today, geographers employ a wide range of methods for data collection, such as, secondary data, 
questionnaire surveys, interviews, visual images, observations and measurements in the field, numerical 
modelling, laboratory methods, spatial modelling and remotely sensed images (Clifford et al, 2010). It is 
noted that some cultural and historical geographers collect written text and arts images as their data, 
whereas soil geographers collect remote sensing data (Clifford et al, 2010; de Paul Obade & Lal, 2013). 
Broadly one could claim that human geographers collect qualitative data, while physical geographers collect 
quantitative data. However, the nature of a discipline is dynamics and changing. Research practices in 
geography are being reshaped by technological innovations. Today, contemporary human geography not 
only collects qualitative data but also quantitative data. Economic geographers analyse quantitative spatial 
data in addition to, feminist geography has adopted GIS technology for qualitative studies (Elwood & Cope, 
2009; Kwan, 2002). It is evident that the nature of geography as a discipline has shaped and is shaped by 
the very nature of the data it uses. 

In general, geography has a clear division between hard and soft research tradition, it also has 
strong interdisciplinary links with other academic fields. Thus, by examining this single discipline, it will 
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be possible to compare research practices at the sub-disciplinary level and gain understandings of how 
cultural characteristics of a discipline shape, and are shaped by research data management. 


2 Research Aim 


The overall aim of this PhD research is to focus on geography and its sub-disciplines with the intention of 
exploring how the nature of discipline shapes current and potential research data management practices 
and, in turn, how disciplines themselves are being reshaped by changes in data creation, use and 
management. The outcome of this understanding will be insights into how to achieve understand how 


effective data management that is sensitive to cultural identities. 


3 Research Questions 


1. What changes are happening in data practices and research data management policy in Geography 
(eg. increasing scale of digital data, open data, sharing data, big data, institutional policy)? 

2. How are responses to changes in data shaped by the nature of (geography as a) discipline? What 
disciplinary factors influence research data management practices? 

3. How are disciplines themselves (or how is geography) being reshaped by changes in the very nature 
of data, data creation, use and management? 

4. How can effective data management policies sensitive to cultural identities be achieved? 


4 Methodology 


The research is in two sequential phases. Phase 1 consists of a scoping study which includes 1) a web-based 
study using different techniques, such as link analysis and bibliometrics, and 2) interviews with data 


management experts. Phase 2 is primarily based on a series of interviews with researchers. 


4.1 Phase 1 


The purpose of phase 1 is to understand the nature of geography and its-sub disciplines (e.g. to identify its 

research problems and methods) and its data (e.g. data types); to identify current activities, key issues in 

research data management in geography; and to identify suitable participants for interviews in phase 2. 
The following activities take place: 


e a web-based study of geography departmental websites, supplemented with a link analysis study of 
disciplinary / interdisciplinary interpersonal network (e.g. based on links between researchers and 
researcher groups) and a small scale bibliometric study of geography (topic analysis) 

e Five interviews with data management experts (UK-based), e.g. staff from the UK Data Archive, 
to discover the key issues affecting research data management. 


4.1.1 Phase 1: Web- based study 


The web-based study examined research group leaders’ profiles on departmental websites and their most 
cited publication in the last five years. This part of the study provides an opportunity to explore the 
relationship between research data and sub-disciplines of geography and sub-disciplinary differences in 
research data, 

The study examines the following: 


a) Researcher’s profile, e.g. 
e the research / methods interest of the researcher 
e types of publication (e.g. books, journals) 

b) and the most cited publication in the last 5 years, e.g. 
e the topic of the research 
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e the research method employed 
e the data in the research, including the creation, use and representation of data (e.g. images, 
histograms) 


The sample consists of research group leaders from 15 geography departments, i.e. five top-ranking 
departments in the ‘geography and environmental studies’ unit of assessment in the Research Assessment 
Exercise 2008, five middle-ranking departments, and five low-ranking departments. Research group leaders 
are chosen to be examined because they usually have relevant research experience to the focus of the 
research groups. In addition, the sampling approach can cover a range of universities with diverse research 
strength, such as, research intensives universities vs. teaching universities. 

As part of the web-based study, a link analysis is used to investigate the relationship between 
research groups, departments and interdisciplinary interpersonal networks in geography. The nature of this 
part of the study is very exploratory. Different sets of seed URLs, such as, a list of geography department 
home pages URLs or a list of individual researcher webpages URLs, are used to address different objectives. 
For instance, in order to explore the relationship between research groups, a list of research group’ webpages 
URLs are used as the seeds URL. 

A small bibliometric study will also be used as part of the web-based study. Bibliometrics has been 
defined as the quantitative study of published literature (Hood & Wilson, 2001). Bibliometrics have been 
used extensively in the study of scholarly communications, e.g. to explore collaboration patterns in academic 
networks (Velden et al, 2010; Melin & Persson, 1996; Subramanyam,1983) and emerging research topics of 
disciplines (Glanzel, 2012; Van den Besselaar & Heimeriks, 2006). In this research, Web of Science will be 
used as the data source, focusing on identifying relevant sub-disciplines and exploring relevant scholarly 
practices in geography more broadly. 


4.1.2 Phase 1: Interviews with data management experts 


A small number of interviews will also be conducted to complement the web-based study in order to gain 
a broader understanding of current trends and issues in research data management. Five interviews with 
UK based data management experts will be conducted to discover their views on current activities, key 


issues in research data management in geography. 


4.2 Phase 2 


The purpose of phase 2 is to investigate researchers' research practices, understandings and attitudes 
towards data management using interviews. A total of 32 interviews (8 researchers from 4 sub-disciplines 
= 32) with academic researchers in geography will be conducted in this phase. 

By the time of the conference, preliminary results from phase 1 of the research will be available for 
reporting. 


5 Expected contribution to current understanding 


This research will provide a better understanding of how the complexity of research data, and so the 
challenges for improving research data management, are grounded in the underlying nature of 
disciplines/sub-disciplines. As a by-product of this understanding, it will enhance the conceptual theory of 
disciplinarity through detailed analysis of changes around research data management in geography and its 
sub-disciplines. In addition, it will contribute to the practical implementation of good research data 
management practices by deepening our understanding of the nature of disciplines. 
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Abstract 

We present a visualization of subject headings that typically accompany books as flat textual metadata. 
The purpose of the visualization is twofold: first to expose the implicit structure in subject headings as 
an overview of a library collection and second to present a visual web of keywords to invite exploration 
of books. Taking a tag cloud as a starting point, the visualization extends it to a networked tag cloud 
that respects the hierarchy that is implicit in subject headings. By allowing an information seeker to 
successively build a subject filter, while seeing the results at each step, we hope to improve the searcher’s 
orientation in a comprehensive book collection 
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1 Introduction 
With many libraries digitizing their collections it is becoming a challenge to explore these information 
spaces. In a traditional brick-and-mortar library one may walk past many displays and magazine racks 
encountering many aspects of the book collection. The development towards digital library collections and 
high-density storage makes it more difficult to get that sense of a collection. However, many libraries have 
already put considerable effort into categorizing their collections with textual subject headings. These 
subject headings are typically ordered as lists of topics, which identify what subject matters are discussed 
in the book. For example, a subject string such as “Spain — History — 20th century” contains three topics. 
Topics that appear first are more general, while topics that appear later are more specific for a given book. 
A cataloguer would attach one or more subject headings to each book each having three to four topics. 

Traditionally a library visitor would use a physical card catalog to search through subject headings 
in order to find a book on a given topic. Today, subject headings are used in keyword search as part of 
digital library interfaces. Either way this requires a searcher to have a relatively specific goal. The searcher 
cannot easily browse the collection by subject headings or get a sense how many books are about the 
respective topics. In a digital library, a reader enters a search query comprising one or several keywords 
leading to a list of search results. These results will be related to the search terms and rarely contain an 
unrelated book, which is great when the searcher’s goal is specific. However, when the searcher has a more 
general interest, they typically only see subsets of the collection. There is no sense of the entire collection. 
In a physical library books are shelved according to a given system such as Dewey Decimals or simply by 
the author’s last name. Books related to a given topic may not be placed next to each other. In a digital 
library books are often presented in decreasing order of relevance to a search term. This is very different to 
the physical library where books cannot be reordered based on each visitors preference. 

We aim to bring a sense of the whole back while supporting the successive filtering and refinement 
of a collection. Towards that goal we present a visualization tool that uses textual subject headings in a 
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collection of books to aid exploration. Subject headings offer a rich under-utilized data source that can help 
provide a sense of context to a library collection in a digital interface. 


2 Background 


The initial aims behind this work were to expose interesting aspects of a library catalog, facilitate the casual 
filtering of books, and help the searcher to gain a better understanding of the collection. We have refined 
these goals in collaboration with a library professional whose work relates to the management of catalogs 
at our university library. In conversations with him we learned about the various fields of the typical 
database schema for library collections. Especially subject headings such as Library of Congress Subject 
Headings (LCSH) repeatedly popped up. On the one hand, subject headings are a form of metadata that 
are very useful in characterizing books, on the other hand they have not been fully utilized in catalog 
interfaces. We subsequently ran a sketching session with six researchers from our lab who were asked to 
sketch visual representations of 15 books and their subject headings to facilitate information exploration. 
Several sketches focused on the directionality of tags in subject headings, which our library expert responded 
positively about. While he acknowledged the variation among cataloguers, he viewed subject headings as a 
fruitful middle ground between highly structured classifications such as the Dewey Decimal System and less 
structured approaches like free-form tagging. Based on our initial goals and the subsequent discussions and 
explorations, we developed a visualization technique that exposes the ordered relationships among tags in 
subject headings. 

Many visualizations systems have been developed that represent the tags or words in a collection 
or document. For unstructured text, several visualization techniques have been introduced that allow for 
the exploration of relationships among terms in textual documents and tags in collections. Word tree 
(Wattenberg & Viegas, 2008) presents a visual concordance that exposes the varying distributions of word- 
to-word sequences in a single text. Based on a selected word acting as the tree’s root, any words that follow 
the selected word are branched off of it. Words which appear further away are connected to its respective 
predecessor. Phrase nets is a related technique that represents the latent network structure in a text where 
the words are connected by specific words, phrases, or patterns (van Ham, Wattenberg, & Viegas, 2009) 
providing a network approach to an unstructured text. Both techniques present uniques perspectives on a 
single text, but are not designed to facilitate search across a collection of resources. 

Tag clouds (Viégas & Wattenberg, 2008) are the most widely deployed text visualizations providing 
an overview of the frequency of words or terms either of a text or of a collection of photos or bookmarks. 
Tag clouds use font size to indicate the frequencies of terms in a data set. While it may not be ideal for a 
precise reading, a tag cloud provides a general sense of which terms are dominant in a given text or 
collection. However, tag clouds are not designed to work with sets of ordered tags such as subject headings 
and typically do not display connections between such tags. The tag cloud are also used in the VisGets 
(Dork, Carpendale, Collins, & Williamson, 2008) interface supporting besides filtering a brushing mechanism 
that shows the correspondence between other facet values as well as the results. 


3 Spidey Sense 
For the purpose of designing a visualization of a book collection we used a subset of books from the Open 
Library!, an online project to create a universal book catalog. Besides typical metadata such as author, 
title, year, and blurb, the site also exposes LCSH terms as ordered tags. We focus here on the interplay 
between the visualization of the subject headings and the display of the books as the result space. 

The Spidey Sense interface has two main parts: a subject visualization on the left and a result view 
on the right (see Figure 1). The results contain the books retrieved from the query that is formulated in 
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the visualization view by selecting tags. With no tag selected, the visualization view shows the top 50 tags 
from all subject headings across all books in the collection. At this initial stage the visualization is similar 
to a tag cloud: each tag represents a topic from a subject heading from a book. We decided to encode the 
frequency of tags using a circle instead of font size alone to make it easier to compare the differences in 
frequency. 

Once a tag is selected the visualization is separated into three parts: left, middle and right (see Figure 1). 
Constituting the current query, the selected tags are placed into the middle of the visualization. The left 
side of the view has many tags varying in size scattered from the top to the bottom. These are tags that 
immediately precede the current selection in the subject headings across all the books that match the current 
tag selection. The right side is structured similarly but shows tags that immediately follow the current 
selection. The tags are connected with subtle curves indicating the ordering of subject tags. 
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Figure 1: The visualization view on the left is juxtaposed with the results view on the right. 


A tag’s label and association with the books in the result list can be explored by hovering. When the cursor 
is moved over a circle the tag label is enlarged and moved out of the circle so that it can be read in full (see 
Figure 2). The hover operation also highlights the books in the result view that have a subject heading 
containing that tag. Likewise hovering over an entry in the result list highlights the corresponding tags in 
the visualization. Brushing allows an information seeker to draw connections between tags and books. Its 
aim is to strengthen the relationship between the visualization and the results to help the searcher gain a 
sense of orientation in the larger collection. 
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@ | secretly taking sides with either the loyalist or patriot causes. 


The seventh star. 


Figure 2: Brushing over a tag in the visualization highlights relevant books in the results view. 


When a tag is clicked the visualization transitions to the filtered stage. The selected tag will move to the 
center. All tags that have a first-order relationship to the chosen tag remain visible and are linked to the 
chosen tag. Tags that appear immediately before the selected tag will move to the left and tags that appear 
immediately after will be moved to the right. All tags are visually connected by curved lines to the center 
tag or tags either to the left or right of it depending on their position in the subject headings. Considering 
that the tags at the beginning of a subject heading represents more general topics, the left side of the 
visualization contains more general tags. Similarly the tags on the right side in the visualization should be 
more specific. Clicking on a tag on either side (left or right) will add that tag to the selected nodes in the 
middle which results in a refinement of the result space. On the flip side, clicking on a tag in the middle 
will remove that tag from the selection, which broadens the query and expands the result space. If the last 
remaining middle tag is clicked, the system returns back to the initial stage. 

The intention behind the design is to help making sense of the changing arrangement of tags as the 
searcher navigates between sets of books. The tags are gradually transitioned between display states to help 
the viewer make sense of the changes on the screen. After a query change is issued, tags which exist in both 
the before state and the after state are not removed but rather moved to their new location. This is done 
to try to keep a sense of fluidity. 

The prototype queries a set of 1000 books from the Open Library database. Open Library is open 
for anyone to add bibliographic information about books. The project is implemented as a web-based system, 
written in JavaScript. Processing.js? is used to draw to the canvas element of HTML5 and the JavaScript 
library springy.js? is used to control the force directed layout. When the web page is loaded it retrieves a 
JSON file containing the metadata about the books. It immediately parses it and stores it in memory. Each 
query is then made on this data store. 


? http://processingjs.org/ 
3 https://github.com/dhotson/springy 


732 


iConference 2014 Cody Coljee-Gray et al. 


Far Queen and country 

1979 

Margaret Drabble. 

A look at British culture during the age of Queen Victoria. 


King Henry Vill 
1998 


a by Robert Green. 

A biography of the English monarch who challenged the 
Pope's authority, established a state religion, married six wives. 
and presided over the beginnings of the Renaissance in England. 


red towers of Granada. 
967 


e 1 
= = S$ Mustrated by Charles Keeping. 


A young scholar of medieval England, accused of having 
leprosy, is denounced by the church and community. He is 
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disorder. and the two friends embark upon a mysterious mission 
for the queen. 


Figure 3: Example use case showing the initial state (top left), one selected tag (top middle), two selected 
tags (top right) and the brushing when hovered over a book (bottom) 


4  Use-Case Scenario 


To demonstrate the potential of this project we provide a brief scenario. Suppose there is a searcher, named 
John, who is interested in finding a new book to read. He uses the Spidey Sense interface to find a book. In 
the initial view, he notices that “History” is fairly large compared to the other tags (see Figure 3, top left). 
This indicates there is a large amount of items in this collection about “History.”. Being generally interested 
in history, John clicks on the tag and the system updates the other tags. With “History” selected in the 
center he can see that the largest circles have the tags “United States”, “19th century” and “20th century” 
(top middle). This indicates these tags are the most common tags occurring in subject headings directly 
before or after a “History” tag. John clicks the tag marked “Great Britain.” Now the visualization indicates 
that most of the books in this subset are about the “19th century” because that tag is the largest (top 
right). After seeing Oliver Twist among the results (bottom right), John moves his mouse over the book 
and some tags in the visualization change color (bottom left). He can see that Oliver Twist has at least one 
subject heading containing the substring “Great Britain — History — 19th century.” John selects “19th 
century” and hovers over Oliver Twist again. Now the “Fiction” tag highlights. John decides he wants to 
read Oliver Twist. He selects the result and puts a hold on the book in his local library. 

Alternatively John could have started with “Fiction”, then “19th century”, next “History” and 
finally “Great Britain.” He would be essentially navigating through the hierarchy from the bottom up. This 
also works if you start with “Great Britain” or “19th century.” John can form the same query by multiply 
paths. 

This is contrary to the typical top down category structure often employed. 
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5 Conclusions 


The Spidey Sense interface takes tag clouds as a visualization many people are familiar with as a starting 
point and introduces hierarchy to aid a more structured exploration of books. While the visualization allows 
an information seeker to use subject headings to explore books, it does not impose a certain order of 
navigating the collection. As digital libraries become more prevalent it is possible to explore new types of 
visual interfaces for navigating their contents. Considering that a lot of effort is invested in the creation 
and maintenance of a catalog, there is an opportunity to expose the rich relationships that collections 
contain. The visualization of the Spidey Sense interface adds a sense of order to an otherwise orderless tag 
cloud. By presenting this order to the searcher they can gain a different perspective on a library data set. 

There are many ways this project can be taken further. First, it would be interesting to deploy the 
interface with an actual library catalog to investigate how searchers would make use of it. Second, it would 
be promising to investigate how such a subject visualization can be integrated with the search mechanisms 
of existing catalog interfaces. Third, we are interested in application areas beyond the library where the 
order of tags, for example, in the hierarchy of a shopping website, could be respected in order to provide a 
more exploratory experience. Often these hierarchies are loosely defined. For example if one is searching for 
paper this might be in the stationary department if the intent was writing paper or in the household 
department if the intent was a paper towel. Spidey Sense provides an elegant way of exposing this 
ambiguity. 
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Abstract 

A photo archive contains diverse narratives that only get partially exposed in digital interfaces. In this 
paper we explore a potential framework for archivists and designers to create photo archive interfaces 
that are sensitive to the ethos and social context of its content. We outline our approach to engaging 
with archival projects and present the results of a pilot workshop, which raised a range of complex 
questions about the design of visual interfaces. Our aim is to practically and conceptually expand how a 
visual interface would let a visitor access, explore, and interpret the contents of an archive. To do this 
we are interested in the different associations that people weave between the artefacts of an archive. 
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1 Introduction 


The history of archives is closely tied with the formation of the modern state (Featherstone 2006). Early 
photo archives provided a new kind of evidence for criminology (Sekula 1986). Besides its role as an 
instrument for social rule and repression, archives have also been seen as a “repository of memories: 
individual and collective, [...] legitimating and subversive” (Bradley 1999). These diverging notions of the 
archive “promise the recovery of lost time, the possibility of being reunited with the lost past” (Freshwater 
2003). However, this allure of the archive has recently been contrasted with a growing recognition of “the 
impossibility of recovering the lost voices of the past in their original meaning” (Bradley 1999). Instead we 
should give up the illusion of recovery and acknowledge that they can merely support the re-creation of the 
past as something new. 

In light of the digitization efforts underway in the cultural sector, we are interested in exploring 
how critical and creative engagements with archives could be supported. There has been already some 
research on the visualization of photo sets and cultural collections (e.g., Bederson 2001 and Whitelaw 2009). 
However, there is little critical discourse around the power and promises of archival interfaces. In this paper, 
we outline an approach to archive interfaces that is sensitive to the politics of a collection and supports new 
forms of interpretative engagements. We practically explore this approach in a pilot workshop, which raised 
questions related to the design of visual interfaces for the photo collection of a specific archive project. 


2 From Promises to Potentialities 


When thinking about the representation of an archive, careful consideration needs to be devoted to its 
ordering structures. A given classification may open up certain pathways, but it can also close others. For 
example, the classification systems used in 18-century archives demonstrated a clear “preference for binary 
divisions and branching tree structures” (Featherstone 2006), possibly impeding more lateral explorations. 
However, in the digital sphere new systems of ordering become possible. There have been some efforts in 
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visualizing collections in zoomable interfaces (Bederson 2001) and arranging photos based on their temporal, 
spatial, visual, or topical similarity (Girgensohn et al. 2010). Seeing a collection as a “rich set of data” 
(Whitelaw 2009) speaks to the hope that visualization can support new forms of archival engagement. 
Counter to the ‘stinginess’ of conventional search, Whitelaw (2012) makes a plea for more generosity in 
archival collection interfaces. Instead of hiding the richness of an archive behind the inhospitable face of a 
search box, the abundance of a collection should be made accessible to an information flaneur (Dork 2011). 
Novel visual interfaces promise to bring about unconventional qualities of cultural collections. However, to 
some degree existing collection interfaces and visualizations tend to perpetuate the seductive allure of the 
archive. While this allure has been challenged from different angles over the last 15 years, we still lack a 
critical discourse around issues of representation and interpretation in the context of archive interfaces and 
visualizations. How can we include the concerns about the promises of archives in the design of novel archive 
interfaces? Drucker (2013) suggests that we should treat interfaces as artefacts that require and enable our 
critical attention. Can we open up archive interfaces to interpretation, but also treat them as interpretative 
devices? 

The rapid growth of early photo collections of criminals triggered inventive, if ethically dubious, 
approaches to classification in order to ensure efficient retrieval (Sekula 1986). Similarly, today’s digitization 
efforts are primarily seen as a means to enable efficient retrieval, which is often associated with a reduction 
of pleasure and a distancing of the researcher (Dorney 2010). We believe that digital interfaces have great 
potential to provide new perspectives through visual and interactive representations. Instead of focusing 
exclusively on the literal characteristics of an artefact, we are interested in a close and profound engagement 
with the archive as both thing and theory. Acknowledging the role of both archive creators and visitors, 
the challenge is to encourage a dialogue between author and audience among equals (Feinberg 2012). As 
the digital sphere offers new ways of seeing the world “through different tropes derived from flows, non- 
linearity and singularities” (Featherstone 2006), we hope to develop an approach to designing new types of 
visual interfaces that negotiate the promises and potentials of digital archives. 


3 Amber Collective 


To learn about a specific archive, we initiated collaboration with Amber, an independent film and 
photography collective that was founded 1969 in the North East of England to collect documents of working 
class culture. In contrast to the mandate of a national archive keeping a representative record, Amber has 
been explicitly dedicated to the stories of working class and marginalized communities. In 2011, the cultural 
significance of the interlinked narrative represented by 22 Amber films and Sirkka-Liisa Konttinen’s 
photography was recognised by UNESCO and inscribed in the Memory of the World register. Amber's 
archive represents many hundreds of collections and many thousands of photographic images in addition to 
film and texts. Each collection (whether from work produced by members of the collective, commissioned 
or collected by them) can be seen as part of a rich and complex whole. To retain the archive’s identity 
means making apparent implicit relationships between materials, which go beyond simple associations of 
author, place, or date. Amber have started to explore their archive as a whole and expose otherwise hidden 
material to the world: allowing for new readings, interpretations, and the potential remixing of content 
while retaining the integrity of the collection. It is our joint goal to investigate the potential of digital 
interfaces for archival collections, while paying close attention to their ethos. 


4 Workshop 

We organized a three-hour pilot workshop titled “Finding new Pathways in Rich Archives” to practically 
and conceptually engage participants in thinking about how an archive interface might look and what it 
could offer to archive users. Inspired by the framework of the workshop ‘Desktop Psychogeographies’ 
organizied by Pitsillides and Maragiannis (2013), our event was divided into four stages: 
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e Stage 1. At the beginning the 10 participants introduced themselves to each other. The group 
consisted of visual artists, filmmakers, photographers, a project manager, and researchers from 
various disciplines including architecture, literature, and computing. Three participants were 
members of the Amber collective, five were associated or had some familiarity with the Amber 
collection, and two were new to the material. 

e Stage 2. We introduced notions such as the information flaneur (Dérk 2011) and generous interfaces 
(Whitelaw 2012) and demonstrated existing archive interfaces. A member of the Amber collective 
gave a brief account on some of the archive images specifically relating to urban development that 
were presented on the table to participants. 

e Stage 3. Using diverse collaging materials (Figure 1), participants were invited to remix images as 
collages to explore associations and narratives, though not explicitly designing an interface. We 
asked participants to think about semantic relations and affinities between things that may not 
ordinarily be revealed through common taxonomies of categories. 

e Stage 4. At the end participants were asked to share their collages on a wall (Figure 2) to allow for 
a round of collective interpretations. The respective creator of a collage was asked to remain silent 
at first, while other participants considered and responded to what they saw, after which the creator 


added their intention to the discussion. 


Figure 1: Participants created collages with images from the Amber archive. 


The workshop produced a rich array of discussions that related to the specifics of this archive but also to 
more general questions about archival collections. Through preliminary analysis we extracted several 
themes, which often interacted with each other in the conversations during the workshop. Here we present 
three themes and reflect on some limitations of the workshop. 


4.1 Meaningful engagement and audience's creative license 


When attention was drawn to a single photograph, participants frequently asked each other what a 
particular image was about and why it was chosen as part of the collage. The way participants collectively 
engaged in exploring a photograph with an understanding of the ethos behind it, especially with its original 
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author also being amongst the participants, created a unique channel. The participants who were less 
familiar with the Amber archive were able to tease out new narratives from within the archive by creating 
collages that prompted the original authors to search their memories and make sense of the seemingly 
random associations presented on the canvas. The collages acted as an elicitation tool that encouraged the 
archive members to see the old content with a new lens — thus making familiar subjects once again 
unfamiliar. Likewise, participants unfamiliar with the archive were also able to visualize associations using 
archival images that were most inspiring to them. For instance, one participant created a collage that 
focused on the human body. The participant explained: 


“It was kind of about bodies, emotions, feelings, and lived experience, of being in this environment 
that I think is in that collection. If I were the viewer, that is the thing I would look at and follow 
up, I wouldn’t look at architecture...” 


However, from the point of view of the authors, there were concerns about letting people create their own 
narratives using archival images. These concerns related to a sense of responsibility and trust they have 
established since working with people whose lives they documented. One participant stated: 


“You couldn't make all the connections, but you can take certain themes, and explore that. Just 
to show people what the possibilities are (...) But if everybody can just pile in and do things, there's 
a risk of losing the story you're trying to open up.”. 


Engaging participants in collaging and storytelling may serve to unlock hidden themes in a collection by 
creating new linkages. However, introducing new linkages also highlighted a tension between seeing an 
archive through new and old lenses. As interface designer we are posed with the question: ‘how much 
creative license should a viewer have when exploring and juxtaposing content online?’ The group 
interpretation not only allowed archivist and designers to confront the limits of an archive interface, but 
also supported us to gauge the extents a user may manipulate visual content of an archive. 


Figure 2: Collages were pinned up on the wall for the group interpretation. 
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4.2 Re-presenting the archive 


The Amber photo archive is still largely on prints and during the workshop we explored those images on 
printed copies. Participants talked about the qualities of some photographs looking older than others even 
though they were all print copies. In some cases a colour image tends to appear more contemporary than 
black and white. Scale (See Figure 3) also had an impact on how people viewed images on the collage. For 
instance, when looking at a larger clustering of photos one participant said: 


“And that’s [referring to a bigger cluster] stronger than that [pointing at a smaller cluster], if people 
have the same sort of partial attention span I have, they’ll go straight to that [big cluster] and they 
may potentially miss that [small cluster] or at least think that’s [small cluster] much lower down 
the scale of importance.” 


Using paper collages we created low-fidelity prototypes of interfaces. These paper prototypes in turn 
highlighted a set of visual qualities (colour, layout, size and shape), which had direct impact on the ways 
participants chose to navigate through information. On one hand those qualities functioned as visual aids 
which focuses one’s gaze on a canvas and helped create a centre of interest for the group discussion. However, 
there were also other visual qualities, which were deliberately distracting and offsetting. For example, one 
participant used punk-ish!' trope to illustrate that the collage has an underlying message that was not 
exclusive to the photographic content but in the way they were cut out and arranged. Again, using visual 
aids to portray ambiguity or certainty may be conflicting in terms of archival interface design. However, it 
is a reminder that we need to take into account how socially-engaged content like the Amber archive may 
create unconventional requirements in terms of user interface design. 


Figure 3: Some participants used different image sizes to suggest hierarchy amongst images’. 


' By “punk-ish” we are referring to images that were torn off from a page or cut out irregularly. 
? Due to copyright of the archival images we are not able to show close-up of each individual collages 
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4.3 Storytelling and ethics 


As we discussed above, in some collages there were elements of storytelling deployed by participants to 
establish a coherent network of photographs within the collage. This indirectly creates a certain amount of 
misrepresentation of subjects’ identities in the photo collection. In a way this is an inevitable aspect when 
making creative use of a historical account on past lives. Participants created several different pathways on 
the collages using narrative structures such as history, politics, gender and personhood. Depending on how 
images were juxtaposed, participants also observed parallel narratives weaving through the same collage. 
In many ways these are ‘standard practice’ when one engages in making collages, however, we also noticed 
that employing human subjects on the collage could become particularly sensitive. Especially when we 
learned that many people depicted in the photos have established a certain understanding and trust with 
the photographers who promised that they would maintain their images in an appropriate context, or 
perhaps out of the public domain. For instance, when asked to comment on how they feel when seeing their 


own photos being used in creative ways, one of the participants explained, 


“Depends on the purpose, really, I guess...not at all offensive seeing it here in this context. But are you 
asking [if] those images were free to be used by anybody in the world, in their chosen context whether I’d 
feel comfortable? I probably wouldn't. Because some [people] are extremely private regarding their 
whereabouts". 


Offering tools to construct complex narratives on an interface could help enrich and deepen an audience’s 
understanding of the archival project. However it was also clear from the paper prototypes that we can 
easily overlook the biographies of each individual image. The workshop exercise was not able to address 
such problem and would need further research to establish a framework to address the ethical boundary of 


user interaction. 


5 Conclusion 


Overall, participants’ response to the collages was very diverse. We think this illustrated that participants 
were able to engage with complex narratives using the relatively simple technique of collage making. Our 
preliminary analysis of the video recordings showed that people approached the collages by looking for sets 
of meanings that were encoded by their authors. While most participants assigned meanings and associations 
onto the collages by looking at how individual photographs were laid out, perhaps unsurprisingly, members 
of the Amber collective were most active in uncovering hidden narratives based on their knowledge of the 
photographs. Many motifs in the collages adopted by participants were in line with Amber’s aim to 
document the impact of urban re-development on local communities. As participants took the workshop as 
an opportunity to engage with over 60 documentary photographs, rich insights and stories about these 
photographs and its social context gradually unfolded throughout the workshop as we walked through each 
collage trying to unpick its underlying meaning. Nonetheless, these collages also challenged and elicited 
concerns regarding the future development of the Amber archive. In a sense, the workshop engaged its 
participants in a meaning-making exercise. Some participants made collages as lens to gain insight to a 
complex collection of stories represented by photographs, whereas the archivists challenged their own 
understandings of the collection. As part of an interface design inquiry, we see the workshop process as 
crucial step to understand how a new audience may interact with archival project, especially when its 
content represents strong political identities. The multidisciplinary interests amongst the participants 
prompted a multi-faceted discussion on a unique archive, its richness in terms of the represented struggles 
and stories, and its future in the digital realm. 
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Abstract 

Research data as an integral part of the scholarly record is increasingly attracting attention of all 
stakeholder groups. Higher education institutions, funding agencies, policy makers as well as the public 
at large see benefits in accelerating science through opening access to research data. More specifically, 
this aims at better re-usability and verification of research findings. The latter is particularly of great 
interest for higher education and other research institutions, as they embody scholarly scrutiny and trust. 
Once established as an university of a new type that should unify research and teaching, Humboldt- 
Universitat has recently created a new job position to develop an institutional concept for research data 
management. In this paper we present the initial situation along with preliminary survey findings and 
draw the consequences for multidisciplinary higher education institutions by taking the example of 
Humboldt-Universitat zu Berlin (HU). 

Keywords: research data, research data management, survey, survey results, higher education institution, responsible conduct of 
research, research integrity 

Citation: Simukovic, E., Kindling, M., & Schirmbacher, P. (2014). Unveiling Research Data Stocks: A Case of Humboldt- 
Universitat zu Berlin. In ¿Conference 2014 Proceedings (p. 742-748). doi:10.9776/14351 

Copyright: Copyright is held by the authors. 

Research Data: Simukovic, Elena et al (2013). Humboldt-Universitaét zu Berlin Research Data Management Survey Results. 
doi:10.5281/zenodo.7448 & Simukovic, Elena et al (2013). Humboldt-Universitat zu Berlin Research Data Management Survey 


Results. Comparing respondent groups “Professor” and “Research associate”. doi:10.5281/zenodo.7449 


Contact: elena.simukovic@cms.hu-berlin.de, maxi.kindling@ibi.hu-berlin.de, schirmbacher@cms.hu-berlin.de 


1 Introduction 


Since the Budapest Open Access Initiative in 2002 and the Berlin Declaration on Open Access to Knowledge 
in the Sciences and Humanities in 2003, the research community strives strongly for unrestricted access to 
scientific outputs. In the light of green and gold open access ways the scholarly literature is being published 
more and more in an "open" mode of access. Some studies even state "that open access is reaching the 
tipping point" (European Commission, 2013). Access to research data is still in its infancy, though. After 
recent high-level data manipulation scandals universities are getting more conscious about the importance 
of research data, which is often fundamentally underlying presented results.' Moreover, some funding 
agencies are fostering the cultural change in a more progressive manner for example by asking for a ‘data 
management plan' when applying for funds. 

In Germany, the German Research Foundation (DFG) as a major funding agency plays a significant 
role in setting general framework for research practice. Its recommendations for responsible conduct of 
research including preservation of underlying data for at least ten years were passed already in 1998 (and 
have been updated recently). These recommendations have been incorporated as good scientific practice by 
a large majority of German universities. However, most of them do not actively promote research data 
management activities (RDM) in their respective communities. Being presumably the first German 
university HU surveyed current research data holdings and researchers needs for institutional support in 
issues pertaining to research data management to such extent. We explored different practices across 
scientific domains and academic career levels, the problematic notion of "research data" itself and an 


institutional role in supporting good research data management. 


1 See for instance Tilburg University (2012) 
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2 Methods 


Following the idea of re-usable research results we decided to take advantage of other Higher Education 
Institutions (HEI) work in surveying research data management in their respective communities. Thanks 
especially to the expertise of the Digital Curation Centre? and numerous projects in the course of JISC 
Managing Research Data Programme? we were able to adopt the questionnaires of the University of 
Glasgow, Imperial College London and University of Cambridge ("Data Asset Framework" and 
"Incremental" projects). Furthermore, surveys that have been done at Swiss Federal Institute of Technology 
(ETH Zürich) and in the PARSE.Insight project benefited our efforts. We then adapted it to local 
circumstances at Humboldt-Universitat zu Berlin (HU) and evolved a final questionnaire consisting of 24 
questions. 

The survey targeted academic staff across all disciplines and departments at HU excluding service 
or administrative personnel. It was based on the assumption that exactly this target group does produce or 
process research data in its daily work. A special mailing list was then created and an open source survey 
application used to build an online questionnaire. After a short pre-test the survey has officially started and 
run for six weeks in January-March 2013.* 

Simultaneously, a hands-on seminar for master students was run during the winter semester 2012- 
13 at the Berlin School for Library and Information Science. During this period we analyzed different 
approaches in organizing research data management support in HEIs in UK, US, Australia, Germany and 
Switzerland. Based on such international comparison, a list of recommendations for next steps at HU was 
produced for university's management board. A comprehensive report together with principal survey results 
was presented at Berlin Library Science Colloquium in the end of May 2013 (Kindling et al., 2013). 


3 Key Findings 


3.1 Response Rate 


As compared with analogous surveys at universities in UK and Switzerland (herein before mentioned), a 
relatively high response rate of 499 responses in total, i.e. approximately 24% was achieved. This was even 
more encouraging as researchers from all faculty departments have participated allowing a detailed analysis 
and comparison of results to be made. Additionally, over 70 participants expressed their interest in an 
individual interview going beyond the questionnaire. 

The different participation rate within respondent groups "Professor" (ca. 29 % of all professorships) 
and "Research associate" (ca. 13 % of all research associates at HU) was a little unexpected. Presumably, 
these numbers reflect the role of senior researchers being more accountable for RDM issues as well as some 
senior researchers responding on behalf of the whole working group. Our further analysis also showed a less 
marked familiarity with regulating policies and possibilities to publish or re-use research data among 
younger researchers, indicating a need to communicate related information more efficiently (see e.g. Figure 


1). 


? http://www.dcc.ac.uk/ 
3 http://www.jisc.ac.uk/whatwedo/programmes/mrd/outputs.aspx 
1 For full survey report (currently only in German) see Simukovic et al. (2013a) 
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Have you ever downloaded or cited research data? 


E Professor “ Research associate 


No, but I intend No, I was not No, I do not No answer 
doing so aware of such intend doing so in 
option the near future 


Figure 1: Have you ever downloaded or cited research data? (Comparison) (based on Simukovic et al. 


(2013c)) 


3.2 Current Research Data Practice 


3.2.1 Data sources and types 


The nature of research data among departments and research institutes at HU have proved to be very 


heterogeneous. As a common university-spanning source and type of research data, text documents were 


identified, together with wide distribution of databases, spreadsheets and images (see Figure 2 and Figure 


3). More specifically, measurement series, statistic analysis, spectra, patient data and surveys were often in 


place. 


Research data sources 


E Observations 

m Experiments 

™ Simulations 

mImages 

E Surveys and interviews 


E Statistics and reference data 
m Logfiles and usage data 

m Text documents 

m Other (please specify) 


Figure 2: Research data sources (Simukovic et al. (2013b)) 
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Research data types 


E [mages 


E Multi-dimensional 
visualisations and models 


= Audio recordings 
m Video recordings 
m Texts 

E Spreadsheets 

E Databases 


E Programmes and 
applications 


m Data specific for your field 
or instrument 


= Other (please specify) 


Figure 3: Research data types (Simukovic et al. (2013b)) 


3.2.2 Storage 


Primary storage places as well as back up on further media indicated predominant use of local options: 
most respondents entrust their research data to hard disk drives of their PCs or laptops and to external 
hard drives or USB flash drives. Even so, back up was carried comparatively frequently, mostly on a daily 
or weekly basis. An interesting incidental finding resulted from free text comments, marking commercial 
cloud storage usage as very common when collaborating with other partner institutions. This highlighted 
the need for an alternative academic cloud service, as researchers increasingly cooperate and exchange/share 
materials across institutional borders and physical locations. 


3.2.3 Good scientific practice 

Apart from several further characteristics on how research data is currently being dealt with at HU, the 
most controversial matter was safeguarding "Good Scientific Practice". Respondents were presented with 
an excerpt of HU policy, saying they are committed to preserve primary data underlying scholarly 
publications for at least ten years. We then asked respondents for a statement, if they do take these rules 
into account and, optionally, to describe common practice in a comment field. Although already passed in 
2002, 20 % of respondents stated they didn't know about these rules and further 17 % were not familiar 
with the current state of implementation in their working groups. Only half of all respondents (56 %) do 
preserve their data as required by the policy. The information provided in comment fields has shed light on 
problematic aspects of these rules. Among most widespread comments researchers considered this obligation 
as not suitable for arts and humanities and therefore not applicable in their research field. Some respondents 
have preserved their data without knowing of these rules followed by another group asking for IT support 
due to short-life media. More importantly, some respondents argued they were not able to guarantee for a 
ten years period due to prevalently short-term projects and job contracts. Sound arguments like these made 
clear on the one hand, that the university has to communicate its expectations for researchers more 
efficiently, and on the other hand to provide institutional support in order to comply with these rules. 
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3.3. Prospects for Institutional Services 


3.3.1 Support and services needed 


Regarding institutional support at HU, researchers were asked to indicate services they would like to have. 
Among the answer options offered, the most desired support was "Secured and backed-up storage for my 
research data" receiving 277 responses. This was followed by "Advice & guidance on legal issues (e.g. access 
restrictions, sensible data, licensing)" (256 responses) and "Advice & guidance on technical issues (e.g. 
metadata, standards, long-term preservation)" (237 responses) (see Figure 4). 


Support & services needed 


E General issues 

E Publishing & citing 

= Technical issues 

= Legal issues 

m Specific issues 

= DMP 

E Storage 

E No need for support 
E Other 


Figure 4: Support and services needed (Short version) (Simukovic et al. (2013b)) 


The results emphasized researchers need for pragmatic support in clarifying fundamental issues pertaining 
to RDM first. As funding agencies in Germany do not explicitly ask for a 'data management plan' when 
applying for funds we observed no strong demand in this area as compared to similar surveys in universities 
in UK. Accordingly, "Support on compiling a data management plan if requested by a research funder" 
received only 122 responses and would be ranked seventh if counted from top. 


3.3.2 Supporting data sharing and archiving 


Furthermore, respondents were asked which type of data archive or repository they would most likely choose 
to deposit or share their research data. A data archive within the researchers department appeared to be 
the most popular choice (216 responses), followed by an institutional data archive at HU (144 responses) 
and an international subject-specific data archive (142 responses). Interestingly, a national subject-specific 
data archive showed up as less favourable (68 responses). Although a general preference for institutional 
solutions was prevailing at a first glance, different use cases emerged. On the one side, researchers want to 
share their working materials including research data in a timely and flexible manner. As it is a matter of 
unpublished results, access to such materials should be restricted to cooperation partners and ideally 
administrated by their home institution. As no sufficient academic solution exists yet, respondents often 
stated to use commercial cloud services such as Dropbox. On the other side, researchers want to disseminate 
final results in their communities. As research is increasingly being conducted on a global scale, researchers 
strive to be read and cited internationally. This explains the preference for international subject-specific 
data archives when it comes to reaching their research community. 
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4 Conclusion 


Summing up, we observed a general willingness of researchers to share research data with their respective 
communities or the public in the broader sense. However, there are major concerns that were raised in the 
survey. Amongst them are the protection of privacy, confidentiality or access restrictions that have to be 
eliminated before granting access to sensitive data. Considerable effort is also needed to prepare data for 
re-use e.g. in providing appropriate data documentation or metadata. At this point, individual and 
institutional roles and responsibilities have to be defined prior to setting mandatory policies. 

At Humboldt-Universitét we are currently developing an institutional concept to support 
researchers in good RDM. In the course of this a strong preference is given to the most required services as 
revealed by the three top positions according to survey results. Meanwhile we drafted a RDM policy in 
addtion to the “rules of good scientific practice” at HU (to be passed by the Academic Senate in the near 
future) in order to reinforce commitment to a common strategy from all parties involved including university 
library, IT services and research support office. 

When developing RDM services, HEIs need to find the right balance by taking different issues into 
account. It starts with keeping in mind the local setting of their own institution and the global nature of 
the research being performed at the same time. Establishing a shared view of all different stakeholders is 
another challenge. Moreover, as most HEIs are home to several scientific disciplines and research fields, 
different ways of working have to be taken into account. We have learned a lot about research practice at 
HU and gained invaluable insights into researchers daily work through direct contacts, most notably through 
follow-up interviews. We also observe increasing inquiry from other scholarly institutions in Germany 
interested in providing particular support or conducting a similar survey in their own environment. While 
learning by doing we hope to convince more scholarly institutions to provide pragmatic RDM support and 
to benefit from collective efforts. 
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Abstract 

The question of how best to measure culture in a location is intensely debated in the arts sector and in 
the adjacent disciplines of urban planning and economic policy. Our research examined a wide array of 
methods of counting culture and discovered a slew of literature on measuring high-level financial, 
employment, and census statistics in the sector, but found no systematic means of measuring culture at 
a granular level as it experienced by citizens. Our research hypothesizes that information about cultural 
events can be aggregated using mixed methodologies to produce a previously-unseen view of culture in a 
community that may be used in future research to give insight on how culture differs and compares 
across locations. A pilot project conducted in winter 2012-2013 in Los Angeles tested new methods for 
identifying sources of cultural event data, aggregating and normalizing them, and evaluating for 
comprehensiveness, feasibility, consistency, and sustainability. 
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1 Introduction 


In January of 2013, Jamie Bennett, then Chief of Staff at the National Endowment for the Arts, said "we 
generally have a research and data problem in the arts. Our data sets are often not as robust, and our 


won 


research is not always seen as being as rigorous as other sectors’" (Hessenius, 2013). Bennett’s statement 
has been widely discussed in the community of arts and cultural professionals, and little disputed. In fact, 
the problem of our sector’s failure to quantify its output and impact has been the subject of concern for 
decades by our top scholars, advocates, philanthropists, and practitioners. 

One possible reason for our sector’s failure to use data-driven evidence to describe and understand 
the field is the reluctance of arts and culture professionals to quantify creative work. Studies of cultural 
impact and learning that use qualitative methodologies abound, while research that relies on data and 
statistics, particularly real-time data, lags. Furthermore, a lack of consensus on a definition of culture 
flexible enough to be inclusive across various spectrums (high/low, professional/amateur, visual/performing 
arts) but specific enough to delineate a universe of discourse does not currently exist. Finally, until recently, 
only a handful of practitioners and researchers in the field had the skills and training to support the data 
collection, management, and analysis work that has in recent years become the hallmark of the “big data” 
movement. 

This is not to say that attempts have not been made to measure the economic or educational impact 
of cultural activity, but such efforts have often been narrowly focused, or built from small samples that cast 
doubt on the comprehensiveness or representational value of the data that serve as evidence. A review of 


current methods of measuring or counting culture is described at length in an environmental scan prepared 
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in anticipation of the pilot program and updated at its conclusion (http://projectaudience.org/wp- 
content /uploads/2013/11/Cultural-Benchmarking-Environmental-Scan-2013093020131206.pdf). 

The objective of the scan was to review the current field of cultural metrics; to describe any gaps 
that might exist in the evidence base for measuring cultural activity; and to create a context for our project 
team’s work on developing a feasible, real-time methodology for measuring and analyzing cultural output 
on a more granular level. 

Our scan of ongoing cultural data collection activities, indicators projects, and published research 
suggested that a real need exists for a robust, real-time data set that can provide a detailed, complete, and 
localized view of cultural offerings available to citizens. Our team imagined that the creation of a 
sustainable, comprehensive, real-time database containing every cultural event in a given location would 
paint a picture of the available cultural inventory in a place and allow it to be analyzed and displayed by 
amount, location, category, price, and audience in order to provide a picture of the scope, nature, and 
location of cultural production in a place. 


2 Impact 


Other cultural indicators projects take a high-level view of cultural impact, examining spending, funding, 
or employment figures and calculating their effect on local, state, or national economies. Such approaches 
assume a level of stability and infrastructure that may not accurately reflect the full scope and richness of 
culture in a community, particularly culture at the edges. In contrast, our interest was principally in the 
direct impact that cultural production has on the lives of citizens of and visitors to a community. However, 
the varied use cases that our team has imagined also promise to have immediate utility for cultural 
policymakers, urban planners, local businesses, philanthropists, arts and education advocates, and 
researchers and to remake the way that these communities think about culture and the creative industries. 
A sampling of the potential uses of a comprehensive events data set, suggested by team members and others 
who have taken an interest in our work, include: 


1. Comparing the residential potential of neighborhoods based on their cultural offerings. We imagine 
a kind of “culture score,” much like the well-loved walk score (www.walkscore.com). Walk score 
calculates the walkability—but not the cultural richness—of an address; a similar scoring process 
could be created with event data to help families and business think about the relative appeal of a 
neighborhood or even a block, based on the availability, variety, and affordability of nearby cultural 
offerings. 

2. Developing real-time map layers to help travelers to choose neighborhoods to visit or stay in based 
on their own cultural interests 

3. Helping local municipal leaders and businesses to better develop and promote their resources and 
advocate for development resources 

4. Providing advocates with tools to lobby for arts funding 

5. Supporting transportation and other infrastructure planning by providing urban planners with 
detailed information about the frequency, timing, and clustering of cultural activities and by 
identifying culturally-rich neighborhoods that are underserved by transit and other municipal 
resources 

6. Providing funders with a tool for supporting under-served communities by providing a map of 
cultural deserts 

7. Developing apps and other interactive, location-based calendaring tools that draw on hyperlocal 
data to deliver personalized recommendations and experiences 

8. Helping cultural organizations make planning and programming decisions 

9. Supporting scholarship into the growth and development of cities 
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The project’s populist approach to the needs of cultural audiences is mirrored in our goal to create a toolset 
that will allow citizens to help describe and define the cultural life of their neighborhoods by contributing 
local event information to the data set. We believe that a cultural data set developed largely for citizens 
and built, at least in part, by citizens has the potential to transform the ways in which people value and 
participate in cultural activities in their neighborhoods and cities. 


3 Los Angeles Pilot Project 


Our team developed a plan for a pilot data collection process to evaluate the feasibility of creating a data 
set consisting of the preponderance of cultural events in a particular location. Though the pilot team had a 
clear understanding of some of the key obstacles to the collection of a large-scale events-focused data set, 
we chose to move ahead with a pilot that would use human resources to harvest, normalize, and analyze 
events taking place in a one-month period in Los Angeles in order to gain a finer sense of the difficulties 
involved in such an undertaking. Our work with this events snapshot would be a first step towards 
developing sustainable, automated methods for creating a sustainable, real-time data set. The team’s goals 
for the pilot were therefore threefold: 1) generate a methodology for capturing the full scope of cultural 
events in Los Angeles County for a limited period of time; 2) test the methodology for completeness and 
measure the difficulty of obtaining data of various types and from diverse sources; and 3) conduct open 
discussions in multiple communities about improving and ultimately automating the methodology and 
sources, determining the utility of the information, and identifying use cases for the data set. 

Los Angeles County was chosen as our study site for both demographic and practical reasons. L.A. 
County, the most populous county in the United States (“U.S. Census Bureau Releases 2010 Census 
Results,” 2011), is geographically vast and ethnically diverse. In selecting it for the pilot, we were guaranteed 
a wide range of event types produced for a diverse and culturally-engaged audience. We were confident that 
many of the complicated problems of definition and classification that we might encounter in other cities 
would be surfaced in a pilot study in L.A. In addition, we had access in Los Angeles to a large online 
community events calendar, ExperienceLA.com, built by a coalition of Los Angeles municipal and arts 
agencies. Arts organizations funded by L.A. County contribute their cultural events to the 
ExperienceLA.com calendar on a compulsory basis, making its data set a useful starting point for our 
research. Finally, longstanding colleagues at the Graduate School of Education and Information Science at 
UCLA had expressed an interest in collaborating with us to provide field researchers to support our data 
collection efforts, and our ability to capitalize on these cost-free resources (the graduate students involved 
with field research were given course credit for their work) helped to make the study affordable. 

For purposes of the Los Angeles pilot, we developed a definition of culture as “the expression of a 
creative endeavor.” To establish consistency in the use of the definition, we worked as a team to apply it 
to specific events or event classes, continuously testing our choices for consistency and repeatability, and 
engaging in sometimes spirited debate with members of Los Angeles’ cultural community about what would 
be included. Sporting events were not included. While sports are often played creatively, we felt that 
sporting events did not meet the definition of having creative effort as their primary goal. Within this 
particular study, for both definitional and logistical reasons, high school and college theater, public art, and 
other forms of cultural production were not captured. However, when replicating the study in a different 
location, especially one with a smaller population, these same elements might not only be feasible and 
appropriate to collect, they may comprise some of the most important culture products a community creates. 

The team established “event” as the study’s core unit of measure. We considered an event to be an 
occurrence with a defined start and end time and a fixed location, thus creating a standard of measure that 
was easily and quickly collected and quantifiable for analysis. For example, each day that an exhibition in 
a museum was open to visitors was counted as a single, unique event. In the case of festivals and fairs, each 
discrete event, such as a screening of a movie in a film festival, counted as one event. 
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Our effort was to cast a wide net, to capture not only establishment culture, but to encompass 
cultural events across the continuum, regardless of professional status, funder, or cost. Based on this 
definition, burlesque, poetry slams, and stand-up comedy were all included, under the rubric of individual 
performance. These edge cases serve a vital role, helping the team to validate the consistency and robustness, 
as well as relevance and currency, of our categories and sub-categories of events, and forcing us to confront 
and consider the range of activities in a constantly evolving creative landscape. 

Certain categories of events were not included, including those considered by our local participants 
as falling outside of the local community’s definition of culture (high school, college, religious and political 
productions, libraries, sporting events, pornography, and trade shows). Public art and architecture, lacking 
fixed start and end times, also fell outside of the scope of the study, although tours of public art and 
architecture were included. Most team members found the choice to exclude public art unfortunate and 
have expressed an interest in trying to capture both in future iterations of the study. 

Although the project team recognizes that the creation of useful permanent databases of culture 
event information will require robust and sophisticated automated tools for data processing and data 
management, our pilot project’s design was purposefully hand-wrought. Our goal was to use a field team to 
perform the tasks that automated tools would (for the most part) handle in future and to record in detail 
the requirements for acquiring, deduplicating, and normalizing data that will eventually be programmed as 
data-handling algorithms. Only with the use of human insight could we be certain that any requirements 
definition for future tools is complete and accurate. 

The sampling frame consisted of all cultural events Los Angeles County during the period from 
December 15, 2012 through January 14, 2013. This 31-day frame was chosen based on a balance of 
expediency and variability. The time span across the holidays allowed seasonal events to be represented, 
including organizations that may have events only once a year, or organizations that might host different 
types of events to appeal to holiday tourists or audiences that take part in seasonal events. The research 
team felt this approach might provide both a diversity of users and a variety of events. The extension of 
the data collection period into 2013 allowed the team to gather data during what was less busy time, 
balancing the uncharacteristically busy holiday season with a quieter period in early January. 

The actual process of data collection was done by hand on a daily basis by the field staff, a time- 
consuming practice. The majority of the data collection employed field team members using the data scraper 
OutWit Hub to extract data from source web pages, although some events, acquired from non-digital (or 
non-scrapable) sources, were simply entered into the collection spreadsheet manually. In most cases, 
information still required some individual formatting, which was done on the fly by the field staff. New 
events were added to online sources up until the date of the event, meaning that recursive review of each 
of the key sources was required, and an understanding of the periodicity of new content adds by these 
sources informed the frequency of return visits to various sources. After the initial 31-day period lapsed, 
the team went into scraped data and further parsed occurrences of movies, museum exhibitions, and gallery 
shows into daily units. This time-consuming and tedious process was a necessary step in helping our team 
to understand the specific requirements of data collection, parsing, and normalization that will inform a 
later, automated phase of event data collection. 


4 Discoveries From the Pilot Project 


This research project represents the rare study in which process and methods developed were far more 
important than the data generated. In the spirit of generating a test methodology for further applied 
research within the field, we moved forward rapidly into the data collection phase, knowing that both the 
data collected and the methodology would be imperfect, but feeling that that this was the surest method of 
developing a more robust methodology for the next iteration. We hypothesize that the data we’ve collected 
represent certain biases relative to the nature and amount of cultural activity within Los Angeles County. 


752 


iConference 2014 Susan Chun et al. 


The most significant form of bias within the data is likely the sources from which it was drawn, 
which were largely online in the form of publications (LA Weekly), event aggregation sites (Los Angeles 
editions of For Your Art, Yelp, etc.), and RSS feeds (particularly those issued by event ticketing agencies). 
The research team has some information that suggests that the sources that were not tapped for data 
collection in the pilot might contain a disproportionate number of cultural events for specific ethnic 
communities, some of which make up a significant population segment within L.A. County. For instance, 
broadcast media sources like television and radio commercials were not incorporated in the pilot but later 
identified as a major information source for non-English speaking ethnic communities. 

Arguably, the definition of culture as employed within this study could be cast more widely (or 
more narrowly) to present a very different picture of the cultural event activity within the area. Key 
elements for this particular sample were the decision to include commercial film showings (a decision made 
after discussion with residents and local cultural professionals about the importance of the moviemaking 
industries to Los Angeles); the decision to exclude public visual art and architecture for definitional reasons; 
and the decision to exclude science museum and historical institution events for the purposes of making the 
study more feasible. 

Data collector focus was another source of potential bias. As one of the data collection team noted 
“Realizing that there is such a large number of events occurring throughout Los Angeles County on a given 
date and that collecting data for each event would take a considerable amount of time, I had to make 
decisions about when to stop and what information not to collect. There are definitely gaps in the data 
collected, and this only speaks to data collected from one source.” She went on to write she saw the need 
for a larger team of field workers, since a single category could include more than 100 events on a given 
day. 

Finally, the time frame selected may bias the sample towards certain types of events, either by 
organization, venue, or category. While we attempted to find a sampling period that would encompass the 
full scope of the diversity of events, it was certainly vulnerable to seasonality. 

While gathering the event-based data during the one-month data collection period, we hypothesized 
that certain events had gone uncaptured because they were publicized through methods we had not 
considered, or because entire categories of events had been overlooked in our planning. The research process 
had been designed to incorporate a phase of validation, in which we would attempt to determine the 
completeness of the data collected during the one-month pilot by contacting individual producing 
organizations in each of several sample zip codes directly and comparing their events calendars to our data 
set. We hoped that the validation process would reveal types of events, event sources or organizations, and 
event audiences that were unknown to us and allow us to revise our future methodology to anticipate these 
overlooked events. 

The validation process proved to be more difficult than anticipated. We used the Cultural Data 
Project data set, http://www.culturaldata.org/, a resource we had not used for data collection, as one of 
the validation sources. A second source was listings provided by Arts for LA http://www.artsforla.org/, a 
non-profit regional advocacy organization that supports arts, culture, and arts education. We received 
participant and member listings from each resource for a group of randomly selected zip codes within L.A. 
County, and canvassed the organizations in the validation set for information on events they had produced 
during our collection timeframe. Initial attempts to cross-reference these two data sets failed when the 
validating sources had no cultural organizations within the selected zip codes. We initiated a second round 
of validation, this time purposely sampling zip codes we knew to be rich in cultural institutions. We 
contacted the small number of organizations that had not been represented within our data set, only to 
discover those organizations had sponsored no events during the time period we were studying. 

While one might conclude that our data set was therefore exhaustive, we believe that not to be the 
case. Rather, the validation results point to a lack of reliable census data for cultural organizations. As our 
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sources, techniques, and data improve through iteration, we expect our data set to become larger and more 
diverse, although until more complete rosters of cultural organizations become available to us, it may be 
difficult to confirm with precision what share of cultural events are being captured. 
In keeping with the threefold goals of the pilot project, we identified areas for redress in the next 
iteration: 
e a greater degree of automation, as our current collection methodology is time-consuming and 
subject to researcher bias 
e expanding the data collection to include longer time frames and different seasons 
e development of better validation practices 
e cooperation with under-represented communities, particularly ethnic and non-English speaking 
groups 
e incorporation of non-Web-based sources for event information 
e refining definitions of culture and changing the unit of measure from an event to a culture-hour 


5 Conclusion 


The pilot project met the team’s initial goals of developing a methodology for capturing a variety of cultural 
events from a range of sources, for testing that methodology for completeness and ease, and for conducting 
conversations with members of the local community about enhancing our methods in future phases. We 
acquired a nuanced understanding of the gaps in data based on our current methodology and identified a 
number of items for review, particularly the establishment of a more consistent core unit of measure, and 
the need to identify sources for events that may never be published in electronic formats. The pilot project 
provided clear indications of the shortcomings of our current methodology as well as potential solutions for 
addressing them in future iterations. 
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This paper looks at the extent to which information is performative, meaning that in addition to having 
a role in gathering, evaluating and/or circulating data, information is also deeply tied to identity work. 
Drawing on interviews with 26 participants, all of whom had moved to New York City in the last two 
years, I analyze references to using technology — specifically mobile technology — in order to avoid looking 
like a tourist. From an urban informatics perspective, this phenomenon provides a means of opening up 
discussion into the inter-relatedness of people, technology and urban space. From an HIB perspective, 
my discussion offers a means of addressing the performative nature of information, which is vital to 
understanding information in the context of everyday life. 
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1 Introduction 


Part of the social turn within information science (IS) as a discipline has been to recognize that locating 
and evaluating information represents only one part of analyzing human information behavior (HIB); more 
broadly (and more complexly) HIB also comprises what people do with information in daily life (Budd & 
Anstaett, 2013; Case, 2012) and how it shapes their movements to and relationships in the world (Jaeger 
& Burnett, 2010). In her work on the information practices of women inmates in a high-security prison, 
Chatman (1999) argued that “information is really a performance. It carries with it a specific narrative that 
is easily adaptable to the expectations and needs of members of a small world” (p. 208). This sociological 
approach to understanding information as a performance provides a framework for analyzing how different 
socio-technical practices are read as normative versus non-normative based on highly localized social 
contexts, what Chatman would call small worlds. In this paper, I consider the performative nature of 
information by drawing on one facet of everyday life for transnational migrants to urban areas: the use of 
technology to avoid looking like a tourist. In this instance, information needs are intertwined with a desire 
to perform a particular kind of identity — in this case, to perform a sense of belonging in and familiarity 
with New York City. 

After a brief introduction to HIB research on transnationalism, I outline the qualitative methods 
used to gather data for this investigation. I then present findings. Given the brevity of this note, my object 
is to use this specific set of practices as a way of articulating the performative function of information and 
point to implications for future work and HIB theory. 


2 Context 


As of 2008, more people worldwide were living in cities than not (United Nations, 2008). This process of 
urbanization has provoked both journalistic and academic inquiry, alternatively focused on the economic, 
environmental and socio-cultural implications of movements of people to cities. Within LIS, research specific 
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to urban environments is largely rooted in investigations of urban libraries1, such as Agosto and Hughes- 
Hassel’s (2005; 2006) work on everyday life information practices of urban teens, Fisher, Durrance and 
Bouch Hinton’s (2004) research on immigrants’ use of public libraries in Queens, New York and Fenster- 
Sparber’s (2008) work on teen reading in a juvenile detention center in New York. More specifically relevant 
to my project, Agada’s (1999) work on gatekeepers in inner-city Milwaukee proved influential in encouraging 
work on non-dominant groups, some of which has focused on city life (e.g. Caidi & Allard, 2005; Caidi, 
Allard & Quirke, 2010; Cheong, 2007; Fisher, Durrance & Bouch Hinton, 2004; Johnson, 2007; Savolainen, 
2007; Spink & Cole, 2001). In contrast, I look at information practices embedded in everyday urban life 
among transnational newcomers. As newcomers to the city, transnational migrants are confronted with 
tasks of making sense of city space, part of which entails learning not just routes between locations but also 
how to act in city space. In this brief paper, I look specifically at this dilemma of performing familiarity 
with city space as a way of thinking about the social life of information. 


3 Methodology 


My analysis is taken from a larger investigation of information practices of transnational migrants in the 
New York City metropolitan area. Interviews were conducted between December of 2011 and September of 
2012. Participants were recruited through two methods: 18 were recruited from English as a Second 
Language (ESL) programs in and around New York City. A second group of eight participants were located 
through word-of-mouth recruitment, and consisted of current or former graduate students from countries 
outside the United States. In both cases, participants were screened based on the length of time that they 
had lived in New York, limiting the interview pool to those who had arrived within the last two years. In 
total, the 26 participants hailed from 20 different countries and ranged in age from 22 to 60. For details on 
participants, see Table 1. 


1 For an activist rather than academic perspective, see Urban Librarians Unite, formed in 2010 (urbanlibrariansunite.org). 
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Participant Details 


Name Age Nationality Residence Profession 
in NYC 

Alice 26 Australia 18 months Barista 
Amelie 27 France 2 weeks Unemployed 
Araceli 29 Mexico 7 months Hostess 
Cecille 33 Cameroon 2 years Teacher 
Ishmael 42 Togo 6 months Unemployed 
Julio 60 Dominican 9 months Unemployed 

Republic 
Jorge 29 Spain 6 months IT worker 
Juan 37 Uruguay 3 months Factory worker 
Kiki 32 Japan 1 year Nanny 
Lalo 35 Chile 4 months Auto mechanic 
Laura 34 Brazil 1 year Unemployed 
Luka 29 Georgia 2 years NGO 
Miao 36 China 1 month Student 
Midori 38 Japan 3 months Research fellow 
Nalan 32 Turkey 2 years IT worker 
Noely 32 Venezuela 1 year PR person 
Raul 22 Honduras 18 months Student 
Rob 28 Puerto Rico 1 year Delivery person 
Carla 26 Philippines 2 years Student 
Dinan 25 India 18 months IT worker 
Gia 32 Indonesia 2 years Journalist 
Jacinta 33 Mexico 2 years NGO 
Javier 26 Brazil 18 months Student 
Lu 28 Mexico 1 year NGO 
Sue 29 Korea 2 years Student 
Wen 26 China 1 year Student 


Table 1: Participant details include age at time of interview, 
country of origin and the length of time in New York. Julio 
declined to provide his age, but we estimate him to be about 60. 
As discussed in the methods section, participants were recruited 


Interviews lasted between 45 and 90 minutes and were audio recorded. Interview questions focused on 
resources for tasks like finding an apartment and locating ESL classes, as well as technologies for keeping 
in touch with people abroad and the role of SNSs in everyday urban life. Transcripts were coded using an 
emic/etic coding strategy (Miles & Huberman, 2009), which involves creating a series of high-level “etic” 
codes corresponding to themes identified prior to coding: urban space, changing relationships to space over 
time, personal networks and technological practices. “Emic” subcategories were then nested underneath, 
reflecting participants’ own terms and conceptualizations of those categories. A number of themes grew out 
of this process, including information practices used to become familiar with city space, relationships to the 
city in terms of navigational as well as ethnicity-based landmarks, and technology and identity work. In 
this paper, I present findings related to this third area, concentrating on the use of technology to mask 
participants’ status as newcomers. 
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4 Findings 

Across interviews, the figure of the tourist drew remarkably similar associations, * including naivety, 
haplessness and ignorance. Interestingly, Cecille (33, Cameroon) explicitly linked taking pleasure in being 
in the city to being a tourist: “You have to be a visitor, a tourist to enjoy the city. If you live in the city, 
you can’t enjoy the city.” Julio (60, Dominican Republic) echoed these sentiments, commenting “I know 
the place [New York] as a tourist, but as a resident, it’s another status.” Julio went on to elaborate on the 
different experiences of New York as a visitor versus as a resident: 


When you come here as a visitor, all the people are very friendly. But later, when you come here, 
to keep family here, it’s another story... Like visitors enjoy the day, maybe they stay here as a 
permanent resident, [but] it’s a lot of money, because no matter what New York is very expensive. 


These accounts suggest a trajectory of identity that moves from tourist to resident and also from leisure to 
labor. They also point to an awareness of difference in privilege ascribed to the identity of natives versus 
newcomers. In the following sections, I describe the role of technology in terms of the desire to avoid looking 
like a tourist. 

Participants overwhelmingly ascribed a kind of vulnerability and foolishness to tourist identity, and 
many described specific uses of technology to avoid the appearance of possessing those traits. For example, 
Amelie (27, France) made it a habit to consult online information about public transportation, specifically 
the Metropolitan Transit Authority (MTA), prior to leaving the house: “I go to the website of MTA to 
know how to have a card, because I don’t want [to look] like a tourist.” Until information about transit is 
sufficiently ingrained, technology becomes an important proxy for information that enables evasion of 
having to ask a stranger or look at a map while at a subway station. Amelie’s reluctance to display lacking 
knowledge was echoed by Sue (29, Korea), who noted that the privacy of mobile technology allows for the 
concealment of her newcomer status: 


I don’t know if I’m comfortable with the paper map anymore, the big paper map, because that’s 
kind of a sign that I’m a tourist. So to hide that, I just use the mobile technology to pretend [to 
be] the native person in that area ... I don’t want to be seen as a tourist in any city, that’s why I 
just want to use [the map] privately, rely on the mobile technology. And avoiding asking some 
person, any person on the spot. 


For Sue, mobile technology is not only useful for providing information about urban space, it furthermore 
provides privacy of information practices, without the vulnerable display of lacking information. In these 
accounts, outing oneself as a non-native is avoided by careful arrangements of technology?, as in Sue’s 
surreptitious referencing of mobile apps or Amelie’s pre-emptive consultation with the MTA website prior 
to leaving the house. These negotiations reflect how information behavior can be at once technological, 
performative and relational, as a deliberate attempt to distance oneself from appearing to be a tourist. 

With extended experience in the city, participants frequently came to contrast current 
understanding of city life with those of tourists, a discursive maneuver that distances their present self (as 
acculturated to and knowledgeable about the city) from a touristy past. Yet, even as participants sought 
to distance themselves from tourists, for those who had visited New York prior to migrating, there was 
often a fondness in recounting initial, pre-migration trips to the city. For example, Giselle (30, Philippines) 
contrasted her initial visit to New York in terms of technology with her current practices: 


2 My favorite description of tourists came from Raul (22 Honduras), who characterized tourists in the following way: “they are wearing 
a backpack, sunglasses, and they have a lot of water.” 

3 Given the increasing popularity among tourists of using PDAs such as ipads to navigate city space (See Tucker, 2013), it’s interesting 
to consider of how the figure of the tourist will continue to evolve in terms of technology (or the lack of technology) signaling either 
belonging or outsider-status. 
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[Using Google Maps is] useful, so you kind of just don’t get lost, but sometimes I forget even to 
look around. When I was here actually two years ago, before having a smart phone and I was just 
a tourist, I literally walked from midtown all the way to the Met ... And I found my way around. 
Granted it took me like the whole day, but I was just like, “oh, wow," looking around. And that 
was great, also because I wasn’t doing anything, I mean I was a tourist and I had all the time to 
walk and think and stop and go here and go there. Google maps helps if you just have to be there. 


It’s interesting that in her description of first visiting New York, Giselle couples “before having a smart 
phone” and being “just a tourist,” collapsing the two conditions. This underscores Sue’s argument that 
being without a smartphone leaves unappealing options of having to ask for help from strangers or carrying 
a paper map, beacons of newness in city space. Another way of reading Giselle’s constructions of being a 
tourist versus being a resident is to suggest that for tourists, the use of technologies to maneuver through 
urban space is almost entirely informational, whereas for those attempting to distance themselves from the 
figure of the tourist, technology provides an additional, performative function of concealing newness. 

As a final example of participants distancing themselves from tourists, Rob (28, Puerto Rico) described 
motivations for familiarizing himself with city life specifically in terms of being a non-tourist: 


My first week, I didn’t know what I was doing. I just let my body go and go with the flow, just 
intake people, like not look like a tourist ... Because I know I’m not going to be a tourist, this is 
going to be my home, I was going to stay here for a long time. Living in another place, it’s such a 
drastic change, I wanted to make it part of me as quickly as possible. 


For Rob, a crucial component of identity work included monitoring the behaviors of those whom he perceives 
to be Native New Yorkers. In asking for specifics about this monitoring, Rob explained, “So I started really 
being observant and watch how New York life is to get adaptive, and not be so lost. In order to avoid 
looking (or feeling) like a tourist, Rob canvassed the practices of those around him that he perceived to be 
city natives. Again, the figure of the tourist is important in giving transnational migrants a counter- 
narrative for identity work, a characterization to avoid as deliberately as possible. 


5 Discussion 


What drives participants’ discomfort with tourist identity? The reluctance to admit a lack of information 
stems from the display of vulnerability or alterity (Chatman, 1999; Hamer, 2003; Hasler & Ruthven, 2011), 
but there are some additional complications in the accounts discussed here. Interestingly, the construction 
of tourists as not just socially but spatially naive is echoed by Lefebvre (1967/2007), who wrote of 


the archetypal touristic delusion of being a participant in [space], and of understanding it 
completely, even though the tourist merely passes through a country or countryside and absorbs 
its image in a quite passive way. The work in its concrete reality, its products, and the productive 
activity involved are all thus obscured and indeed consigned to oblivion. (p. 189) 


Lefebvre’s definition hinges on participation, where tourists are distinct from natives because of their 
passivity and lack of engagement with space. For Lefebvre, it is impossible for tourists to understand the 
inter-relationships between space and social practice. 

Participants echoed many facets of Lefebvre’s construction, in that they associated tourists with 
transitory visiting, with pleasure rather than work, with whirlwind, predictable breadth of “tourist traps” 
rather than a few “hidden gems” or self-made discoveries. I would argue that transnational migrants I 
interviewed were resistant to the “placelessness” associated with mass tourism (Wearing, Stevenson & 
Young, 2010, p. 20), wanting precisely to assert claims to city space. As such, tourist identity does a 
disservice to the sustained engagement and difficult work required to stake claims of familiarity with city 
space. The stakes are, partly, showing commitment to the city; Tourist identity, for participants, denoted 
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an ignorance that is declared openly and uses human informants rather than technology to fill in gaps in 
knowledge. In this way, technological practices of performing belonging fit into participants’ commitment 
to the city, what Massey (2005) might call a commitment to transitioning from space (as static and 
essentialized) to place (as having personal, affective and social connotations). Spatial practices are also 
technological practices, where tracking the relationship to technology across participant accounts of 
everyday urban life generates insight not only into technical functionality that is or isn’t useful in navigating 
space, it also speaks to the social stakes of leveraging technology to perform familiarity rather than alterity. 
This reflects Chatman’s argument that from a sociological perspective, “Information has little to do with 
data. It means nothing at all if it is not part of a system of related ideas, expectations, standards and 
values” (Chatman, 1999, p. 209). 

There are advantages to tourist identity, which can enable someone to avoid seeming like a threat 
(as demonstrated by an Occupy Wall Street tactic of dressing like a tourist (Occupy Wall Street, 2012)). 
Also, the apt use of technology should not be positioned as universally sufficient to pass effectively from 
one group to another; ethnographic research is riddled with accounts of attempts (and frequently failures) 
to pass in a social context outside one’s own, many of which would be unaltered by a well-timed display of 
technological proficiency. As a whole, my argument is neither that tourist identity is universally to be 
avoided nor that technology is universally sufficient to pass as a native, but rather that looking at technology 
as a tool of social performance and not just a tool of information provides an insightful lens for interrogating 


boundaries of belonging and privilege. 


6 Conclusions 


Experiences of transnational migration required participants to make sense of their identities in a new space 
(or really, set of spaces) and also to produce and reshape identities as part of the process of maintaining 
relationships abroad. In this brief paper, I have examined HIB in terms of performance, where participants 
used technology, particularly mobile technology, to hide their status as newcomers. The dual functionality 
of obtaining information and then using that information to produce or perform a particular identity 
becomes evident as newcomers leverage a range of technologies — from the MTA webpage to smart phone 
apps — to produce information about the city as well as themselves, and to undertake identity work that 
situates themselves as having a relationship of belonging in and familiarity with the city. These practices 
were one of the clearest indications of the extent to which locating information about city space is only 
partly about information, and moreover has a powerful link to managing identity. 
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1 Introduction 


The Digital Archives and Marginalized Communities Project (DAMC) is a Social Sciences and Humanities 
Research Council (SSHRC) funded collaboration that uses digital information systems to highlight and 
interrogate the complex and related topics of colonialism in Canada, violence against indigenous women 
and girls, and sex work. The project is in the early stages of development; this paper therefore explores how 
the project’s interdisciplinary theoretical framework and methodology influence the development of digital 
information systems that embed community ontologies and epistemologies into their overall design, 


organization, and record appraisal and description, while also meeting broader project anti-violence and 
social justice objectives. 

We are developing three separate but related digital databases/archives using a participatory design 
process with stakeholder groups. Working titles for the archives are the Missing Women Database (MWD), 
Sex Work Database (SWD), and Post-Apology Residential School Database (PARSD). The archives will 
house related academic research, print and visual media, on and offline activism, commemorative initiatives, 
and image collections. As our relationships with the communities involved with these collections develop, 
so do the collections themselves. 

The project aims to investigate how communities can adopt digital information platforms and 
systems which are reflective of community derived epistemologies, ontologies, and social justice objectives. 
Our overarching objectives are: to create and mobilize—via multiple forms of digital media—knowledge 
that contests and re-envisions conceptions of violence against certain people as normal; to build bridges and 
dialogue between academic and non-academic stakeholders using on and offline tools such as knowledge 
sharing, new social media, online and ‘real world’ conference participation, and the opportunity to curate 
digital exhibits together; and to create community-based archives that preserve community-identified 
cultural heritage. 

Individual objectives for the archives vary according to the interests of the groups involved. Ongoing 
consultations continue to refine the objectives for each initiative. Currently, MWD and SWD exist to 
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preserve the voices and work of missing and murdered women’s advocates as well as those of politicized sex 
workers, to mobilize this knowledge by facilitating communication and resource-sharing to expand and 
enhance the work of these often quite divided groups, and thus to encourage much-needed critical 
engagement and information literacy skills concerning murdered and missing women and sex work. 

Current objectives for PARSD include the collection of Indigenous and non-Indigenous media 
representations of and related academic, activist and/or community-level initiatives undertaken since the 
Canadian government’s official apology for Indian Residential Schools on June 11", 2008. Other goals 
currently include the examination of PARSD records to find links and track intergenerational effects of 
residential schools, the filling of a gap in the Canadian Truth and Reconciliation Commission’s mandate by 
making PARSD records available to them; and the encouragement of further critical engagement, healing, 
decolonization and reconciliation. 


2 Context 


To date, there are almost 600 confirmed missing and/or violently murdered Indigenous women across 
Canada (Amnesty International, 2009; NWAC, 2011). Some of these women are/were sex workers, and even 
the briefest consideration of North America’s colonial history (and present) provides many reasons why 
Indigenous women are over-represented in inner-city populations of women involved in outdoor sex work 
(Anderson, 2000; Razack, 2002; Smith, 2005). In this research project, the Missing Women Database forms 
a thematic bridge between the Sex Work Database and Post-Apology Residential School Database. 
Foregrounding this link brings into focus a myriad of connections between colonialism past and present, 
and the experiences of many contemporary Indigenous women and girls. Intergenerational effects of Indian 
residential school violence are known to substantially impoverish and disenfranchise Indigenous women and 
girls (Bombay, Matheson & Anisman, 2011; Deerchild, 2005; Anderson, 2000). Attempting to escape such a 
fate, many women and girls move from rural northern communities into southern urban contexts where, 
without adequate resources or cultural supports, many end up populating the poorest levels of the street- 
involved sex trade (Jacobs & Williams, 2008; Peach & Ladner, 2010). Despite political differences, therefore, 
the interests of those who would record and address Indian residential school violence, advocates for missing 
and murdered Indigenous women, and anti-violence sex worker activists interlock. Researchers and 
activists—including us—identify the ongoing violation and degradation of Indigenous women and girls as 
one of the most devastatingly obvious and far-reaching effects of colonization in Canada. 


3 Theoretical framework 


The archival and social media-based elements of the project are undertaken with the understanding that 
colonial, classed, raced, and gendered systems of (dis)empowerment operate in both technological and 
academic contexts (Gonzalez & Rodiriguez, 2003; Brown & Strega, 2005; Wilson, 2008). Our research builds 
on the platform of digital divide literature (Warschauer, 2003), suggesting that access must mean “knowing 
how to use” as well as access to meaningful and representative content. More than this, our work strives to 
engage communities to digitally collect and preserve their cultural heritage in ways that are meaningful to 
them, and to engage in the process of creating the structure and relationships embedded within information 
systems that reflect their understandings and knowledge(s). 

The collections in this project are part of a growing trend towards more mediated and 
contextualized digital archives. We embrace the conceptualization of the activist archives because it 
emphasizes that archives are constructed spaces where struggles over meaning making take place. An 
activist archives includes: a commitment to social justice that privileges marginalized perspectives and users; 
the building of collections focusing on under-represented and marginalized perspectives; and the privileging 
of particular community interests such that it is developed collaboratively (Lile, 2012). Archival studies 
research emphasizes the power and importance of creating permanent representations of the voices of 
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marginalized populations (Carter, 2006), as well as ways the documentary legacies of marginalized people 
can be used to counterbalance mainstream narratives in the struggle for justice (Harris, 2007, Jimerson, 
2009). 

This project brings together the research interests, political concerns, and community connections 
of researchers from three separate academic disciplines (Feminist Critical Inquiry, Political Science, and 
Information Studies). Our project draws into conversation Indigenous, feminist, critical race, and sex worker 
anti-violence theory and criticism. Such inter/cross-disciplinarity enables us to construct a more 
comprehensive framework through which to address ongoing violence against Indigenous and other 
racialized, poor, or sexually ‘transgressive’ women in Canada. As Dei, Hall & Rosenberg remind us, 
“Indigenous knowledges” may be understood as bodies of knowledge “associated with the long-term 
occupancy of a certain place” and built by groups cumulatively “through both historical and current 
experience” (qtd in Shahjahan, 2005, p.213). Central to our project is the understanding of such knowledges 
as, in Shahjahan’s words, “a rich social depot, which can bring about social justice in a variety of cultural 
contexts” (2005, p.214). 


4 Methodology 


Described by Spinuzzi as a way to “understand knowledge by doing” (2005, p. 163), participatory design 
engages with stakeholders throughout a given project, from the articulation of project goals, to product 
planning, prototyping, and implementation. A participatory design approach has been successfully 
implemented to develop a number of Indigenous community-based digital information projects such as K- 
Net (Cracin, 2006) and Tribal Peace (Srinivasan, 2007). As well, this methodology has been extended by 
Shilton and Srinivasan to develop a participatory archiving model that facilitates the development of 
community articulated metadata designed to create participatory based archives that preserve community- 
identified cultural heritage in a way that “resonates with community understandings and knowledge” (2007, 
p.96). Following Srinivasan, we “probe into the possibilities for communities to serve as the content 
creators, interface designers, and, most importantly, information architects and ontology creators of their 
own systems” (2007, p.725). It is through this deep embedding of knowledge structures into the design of 
our digital systems that we begin to perceive how communities understand their worlds, how to build 
bridges between communities, and how to do the anti-violence research and activism at the centre of this 
project. 

At present we are developing our own participatory strategy. We use a combination of participatory 
methods, including hiring community members to act as consultants over the lifespan of the project as well 
as conducting a series of “town hall” style community meetings held at key points during the project’s 
implementation. We will travel to major cities across Canada several times over the course of the project 
to gain as broad an understanding of possible of stakeholder concerns. Community consultants for the 
project, identified through longstanding research relationships with project investigators, will be integral to 


recruiting community members. 


5 Implications of Research 


As we begin this work, we have come to recognize that taking into consideration the voices and wishes of 
stakeholder communities sometimes means violating the information professional’s impulse to preserve. 
However, we have begun to consider the consequences of preserving and taking public even small sections 
of these collections. More specifically, it has become necessary for us to consider whether there is value in 
loss, change, or erasure of materials, especially in politically volatile contexts. For example, feminist 
communities in and outside the academy are not always safe spaces for sex workers. While some sex worker 
groups and individuals continue to speak out despite this subaltern status, others refrain from further public 
disclosure. Moreover, despite the essential legality of the exchange of sexual services for payment in Canada, 
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whore stigma and existing prostitution-related laws that criminalize all but the actual exchange make it 
relatively risky (even dangerous) for people to ‘come out’ publicly as sex workers. Some content producers 
may, therefore, choose not to have their materials preserved in SWD. More work needs to be done that 
recognizes the value, indeed strategic nature, of archival silence and erasure for marginalized communities 
(Carter, 2006). Like other archival theorists before us (Kaplan, 2000), we continue to think through the 
impacts that the preservation of particular records may have on community groups or individuals. 

As suggested previously, the violence associated with whore stigma and colonialism intersect to 
produce terrible effects on Indigenous women. In the context of MWD, the organizational practices we 
undertake—such as record tagging and description—draw together advocate-produced missing women 
posters, commemorative initiatives for murdered women, and dominant news media representations of 
murdered and missing women. Such juxtapositions facilitate a deeper understanding of the colonial and 
misogynist ideologies that produce extreme violence against Indigenous women. 

We use tags to bring together a variety of records that are disparate in format, content and tone. 
Hope Olson (1998) calls for “putting marginalized knowledge domains beside mainstream knowledge 
domains to create paradoxical spaces that are neither mainstream nor marginal but are both simultaneously 
or alternately”. She argues that classification schemes are socially constructed spaces based on mainstream 
knowledge systems. As such, they create room for particular constructs and ideas while limiting or silencing 
alternative perspectives. Typical strategies of preservation and representation obscure and make invisible 
the constructed nature of representational categories and vocabularies. We want to foreground not only the 
socially constructed-ness of the categories we employ, but also the process of determining these categories. 

In the SWD and MWD we create paradoxical spaces by applying specific terms to item records. 
Many of the tags deployed in this context are terms that make us uncomfortable. These are terms such as 
hooker and whore. We identify these terms as tags when the word appears in an item. Within sex work and 
murdered and missing women activist communities, these terms are occasionally used as reclamations. 
Outside of these communities, the terms are often used to sensationalize and marginalize. Retrieval using 
these tags puts contradictory items alongside each other creating a paradoxical space. This creates a political 
juxtaposition that we hope creates room for dialogue and action. 

But there is more to take into consideration here. In the recent past, dominant Canadian news 
media’s standard representations of missing and murdered Indigenous women—when they cover these cases 
at all — have been critiqued by feminists and Indigenous groups for representing violence against these 
persons as unremarkable. Dominant news media has also been condemned by feminists and Indigenous 
groups for invoking street-involvement, prostitution, and illegal drug addictions, even when the women in 
question were not involved in any of the above (Ferris, 2007, 2014; Jiwani & Young, 2006). Community- 
produced representations of the same women provide significantly more complex, nuanced, and mournful 
representations of beloved mothers, daughters, sisters, aunties, grandmothers, friends, lovers, community 
activists, scholars, and much-missed community members (NWAC, 2010). In that many community- 
produced representations appear to ‘speak back’ to dominant media accounts to cultivate public concern 
for loved ones, the drawing together of these disparate representations in MWD could be very useful for 
many audiences. 

However, because dominant narratives are backed by more than 500 years of colonialism, 
patriarchy, and whore stigma, some advocates for missing and murdered women have asked that dominant 
media and community-produced representations not be placed alongside one another, even in this activist 
archives. Indeed, we have begun to consider whether, in instances like these, dominant representations 
should be archived at all. Meeting the needs of community members while maintaining larger project goals 
of information literacy, decolonization, and anti-violence work requires thoughtful consideration, and is an 
ongoing process. Determining how and in what context, perhaps even whether paradoxical spaces should 
be employed in MWD at all, is part of this process. 
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6 DAMC Implementation 


We anticipate that it will take at least four years to research and design PARSD, and to complete baseline 
development and implementation of MWD and SWD. Access—public or otherwise—to materials in each 
archives will be determined by stakeholder groups. Media records in the archives will be public only if 
permissions are acquired, but will be privately accessible for research purposes. To date, we have established 
partnerships with activists and academics in cities across Canada. In consultation with all of these partners, 
we have developed permissions processes that provide options in terms of the public availability of anything 
groups or individuals choose to include in a given database. As noted above, we are also in the process of 
developing a participatory design process that engages community stakeholders in project development 
throughout the duration of the project. Material collection, organization, and description is an ongoing 
process and one that will be nuanced and transformed as the participatory process continues. 


7 ~ Conclusion 


We have only begun to explore the implications for both anti-colonial feminist and Information Studies 
scholarship of engaging together in this multidisciplinary collaboration. Typically, we, in Information 
Studies, have thought of ourselves as only the architects, providing the structure through which particular 
pieces of information are accessed (Bates, 1999). More recent work has challenged this notion, suggesting 
that the lenses through which we see the world are necessarily built into the structure of information and 
classification systems (Bowker & Star, 2000). When we see our role as only that of categorizing knowledge, 
we exempt ourselves from the responsibility to engage with fundamental questions about the relationship 
between methodological frameworks, how and what meaning(s) are created in this process, and what the 
implications of the representations we create are for the communities with whom we work. The Digital 
Archives and Marginalized Communities Project is still in its infancy. At present, we have only begun to 
explore the implications for Information Studies of engaging in feminist anti-colonial and anti-violence work. 
It is clear, however, that there is much to be considered in this under-theorized area. The first years of 
collaboration on this interdisciplinary project have demonstrated to us that there is much we can learn from 
one another. There is much that we can and should do together; and our partnerships are far greater than 
the sum of their parts. 
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Abstract 

Research into teams has focused largely on intra-team or sub-team activities. Although activities 
involving two or more teams are becoming increasingly common due to outsourcing and globalization in 
the workplace, there are few studies about them. In this project, we studied two engineering teams and 
their activities for five months. The two teams, which belong to a Canadian company, are located in 
different countries. We collected different kinds of data to explore various aspects of the teams and their 
activities. In this research note, we report our preliminary findings about two teams’ communication 
practices. Specifically, the findings suggest that despite the presence of video conference tool, file sharing 
tool, electronic mails, and phones, onsite visit and/or face-to-face interactions have great impact on the 
satisfaction level of the members’ experiences of working with another remote team of different national 


culture. 
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1 Introduction 


A lot of studies have been conducted to understand group dynamics (e.g.,Janis, 1982; Hoyt & Blascovich, 
2003), predict group performance (Kolfschoten et al., 2011, Kelly et al., 2011, Busche & Coetzer, 2007), and 
improve the quality of group activities (Shapira et al., 2001; Spring & Vathanophas, 2003). Team decision 
making, in particular, has been studied extensively in different fields. For example, a keyword search of 
“team decision making” returns 752 results in Business Source Complete (a leading database for scholarly 
work in business) and 661 in ProQuest (a leading database for scholarly work in humanities, social sciences, 
and education). The introduction of group technologies enables teams to work together with less time and 
location constraint. Various studies have investigated technology related issues for supporting and enriching 
team decision making with either focuses on technology design and evaluation, or teamwork issues 
introduced by the technology-mediated communication and collaboration channels. 

Our work is focused on understanding and supporting decision-making activities between two 
teams. Between-teams activities are different from those of sub-teams or sometimes referred to as subgroups 
in the literature. A subgroup is a collective entity that characterizes itself by a form or degree of 
interdependence and that is unique when compared to that of other members, and has to be a subset of 
members of the same work team whose membership and tasks formally recognized by the organization 
(Kozlowski & Bell, 2003). Although there have been a number of studies on subgroups in teamwork (e.g., 
Ocker et al., 2011, Carton and Cummings, 2012), there are much fewer studies investigating activities which 
involve two or more groups. On the other hand, decision-making activities that involve two international 
teams are increasingly common in our globalizing work environment. According to research in organizational 
behavior (e.g., Tannenbaum et al., 2012), many global teams may have new unique characteristics that are 
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not yet well understood. For example, little is known how teams adapt to distance and communication 
technologies. In an attempt to address this literature gap, we studied an international company’s two 
engineering teams that regularly faced the situation of making design decisions together in their work. These 
teams are located in two countries: one is at the company’s home location in a major Canadian city, and 
the other is at a branch office in a major Chinese city. During the study, we collected various kinds of data 
that are about different group variables adopting McGrath’s classical conceptual model about small groups 
(1984). In this report of preliminary findings, we report the observed teams’ communication practices and 
discuss the implications of our findings. 


2 Related Work 


Our literature review shows that the study of culture’s impact on teamwork has been focused on three levels 
of cultural understanding: national culture, organizational culture, and team culture. Researchers have 
examined the impact of national culture on decision-making styles, process, and involvement (Muller & 
Ozcan, 2008; Waragarn & Rafique, 2007). The research focus has mainly been on understanding the 
differences in decision-making styles among different cultures, e.g., comparison of decision-making styles 
between German team and Swedish teams (Turner & Muller, 2003). Given that Canada and China have 
two distinct cultures, we expect that the national culture plays an important role in the process. For 
example, China’s cultural tendency is toward high power distance due to its Confucian roots (Martinsons 
& Westwood, 1997). This leads to a hierarchical power structure in work place, whereby the project 
managers or team leaders are often much more powerful in decision making processes than teams of Western 
culture. Also, Chinese people prefer group-based operations emphasizing individual relationships and 
informal forms of communication within small groups (i.e., guanxi network) (Lai et al., 2001; Kunnathur & 
Shi, 2001; Zhang et al., 2003). 

Schein defined organizational culture as a set of implicit assumptions shared within the group that 
determine its perspective of and reaction to various environments (Schein, 1992). The empirical study by 
Hofstede et al. (1990) showed that shared perceptions of daily practices to be the core of an organization’s 
culture. Although there have been a lot of studies relevant to organizational culture (e.g., Hoftstede et al.’s 
paper was cited over 2,000 times according to Google Scholar), we have not identified articles that are 
about the role of organizational culture in team decision-making. However, there are a few studies that 
relate organizational culture and organizational decision-making or decision-making processes in the 
organization in general. For example, in demonstrating how leaders could create organizational culture that 
supersedes national culture values and norms McLaurin (2008) discussed how different organizational 
cultures could affect decision-making practice and process in the organization. Feldman (1988) presented 
how organizational culture affects organizational decision-making process in his work on innovation in the 
organization. 

Researchers also study team culture. Teams are microcosms of organizational culture (Suzuki, 
1997). Although there is no clear definition given from the literature, group culture in general is considered 
to include a set of norms and values that are about how things should work and how people should behave 
in a group (Schein, 1985). The values and attitudes of the working group affect the behavior of the group, 
whose collective patterns of behavior contribute to the group culture. The group culture, in return, has 
significant impact on the values and attitudes of the group. We found very few research studies that 
investigated team culture or group culture in business setting (6 results returned with the keyword search 
“team culture” in title of the articles in business source complete), and no article was found that discusses 
the role of team culture in team decision-making. Hoftstede’s measure (1980) for organizational culture was 
used in Workman’s study (2005) to measure virtual team culture such as team’s structure, relationships, 
and primacy. 
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In our study, we considered the effects of both national culture and team culture in the between- 
team decision-making processes. For example, we collected data to understand each team’s characteristics 
and intercultural sensitivity, practices in communication and decision-making, and strategies in conflict 
management. By comparing these dimensions of the two teams, it provides us with a better understanding 
of the cultural differences and influences in between-team decision-making activities. 


3 Research Methodology 


The two international teams consisted of members from their country of residence (i-e., Canada and China). 
We assumed several key aspects of their team cultures that would affect the between-team decision making: 
the team members’ attitudes on intercultural communication, their personalities that would affect their 
work attitudes, their communication practices, their decision-making styles, and their conflict management 
styles. During the six month study period, we used different data collection techniques to help us 
understand these aspects, including two sets of online questionnaires to measure the team members’ 
intercultural sensitivity and personalities, three semi-structured interviews per team member to measure 
the teams’ communication, decision-making, and conflict management styles, over 70 hours of field 
observations to understand the teams’ daily work practices within the team and between the teams 
(observation of the remote team meetings), and electronic mail records that showed the between-team 
communications. We were not able to be on the mailing lists of the teams so as to collect all the 
communication records between teams and/or within the team during the study period. This was the 
company’s decision due to its concerns about the leaking of sensitive information. Instead, the company 
designated a team member who was a participant in our study to forward us the emails that were deemed 
to be sharable to the researchers. We were not able to attend all the meetings either for the same reason. 
In total, we attended three meetings (two within-team meetings and one between-team meeting) and 147 
emails that were about nine topics. We also hoped to analyze the company’s policies regarding teamwork 
and communication between remote sites but were told that such documentation was not available. 

We report here the preliminary findings from our interview data that helped us understand the 
difference and similarity in communication practice between the two teams. 


4 Preliminary Findings — Teams’ Communication Practices 


We interviewed each participating member three times during the study. The first interview was about 
members’ roles, experiences of working in multicultural environment, and teams’ history. The second 
interview was about communication structures and practices, meeting structure and practices, and 
interpersonal relationship. And the third interview was about conflict management. In the third interview, 
we also included the interview questions related to understanding the impact of national culture on decision- 
making styles. The third interviews’ questions were adapted from Waragarn and Rafique’s study (2007). 

Overall, thirty interviews were conducted with fifteen on the Canadian site, five on the Chinese 
site, and ten with the Chinese team members via Skype. All interviews were conducted face-to-face within 
a one-to-one setting. All interviews were audio recorded and the recordings were transcribed. The interviews 
were conducted in Chinese and English depending on the interviewees’ primary language. 


4.1 Forms of Communication within the Teams 


Both teams acknowledged that in-person communication is a common communication method within the 
team. All interview participants from the Canadian team mentioned in-person communication before they 
mentioned any other form, which suggests that it is the first kind of communication that they are likely to 
use, and the one that they are most likely to rely on. Three participants mentioned that they are able to 
talk in person, and two specifically mentioned that it is very easy for them to talk in person. One participant 
explained that it is intentional to have team members sit close to each other — “All the teams, we try and 


772 


iConference 2014 Lu Xiao et al. 


cohabitate them, so they are sitting together. So there’s a lot of interaction just...they’re all within a short 
distance, so a lot of communication that way. And if the rest of the team is in XX, they’ll go and talk to 
them, go to their desk and talk to them.” Four Chinese participants also suggested the main communication 
method is face-to-face. All implied the office’s physical setting contributes to the possibilities for face-to- 
face communication. One participant mentioned that face-to-face communication could fully meet the needs 
for primary internal discussions. 

Email is considered another common approach used for internal communication within both teams. 
However, there seems to be a difference between the teams in terms of how frequently email is used. In the 
Canadian team, three of five participants acknowledged that email is their most common communication 
method. One participant mentioned that he writes “20 emails a day, work emails” and that communication 
“is email-intense”. Another participant acknowledged a preference for email by saying: “Here, we generally 
always send by email.” Such statements were not observed in interviews with the Chinese team members. 
One participant commented that there are about two to three emails in a day. Another Chinese participant 
explained that depending on the problem’s complexity and importance, emails may be used for receiving 
background in-formation or involving more people in the process. Why emails are so commonly used in the 
Canadian team is unknown, although one participant made note of the record-keeping and follow-up 
potential with email, noting: “Sometimes I use email...’'m not a big email person, unless I am trying to 
record things. But if I go and talk to somebody in person, Pll just send an email about it.” Interviews with 
the Chinese team members suggested several reasons for using emails within the team including the record- 
keeping and follow-up potential, the importance of the matter, get-ting other people involved, and sending 
Internet links or files. 

There also seems to be a difference between two teams in terms of the phone usage for local 
communication. Phone use is somewhat rare for local communication within the Canadian team, with three 
participants mentioning low local phone use and giving reasons for it. For example, one participant states 
that, “Phone’s rare. Well, rare enough.” And the second participant states that “Phone not so much either, 
unless I feel somewhat lazy”. Four Chinese team members mentioned phone usage for local communication. 
All participants commented the use of it on different occasions. For example, two participants noted the 
practice of calling members when they are off site, and four participants noted the practice of using it when 
members are after work or on vacation. There is no clear reason why the Chinese team members seem to 
be more likely to make phone calls when off-site, a practice that is not mentioned by the Canadian team 
members. 

All four participants who hold managerial roles in the company noted that they have regular team 
meetings. One Canadian engineer/designer participant also acknowledged regular team meetings. Only one 
of the three Chinese engineer participants noted that team meetings are not design review meetings and 
explained that such meetings are need-based. 

One Chinese participant’s response implies that instant messaging (IM) is a tool that is allowed at 
the Chinese site, and that the team members sometimes use IM. One Canadian participant’s response, 
however, suggests that IM is not allowed at the Canadian site. 


4.2 Forms of Communication between the Teams 


We asked both teams’ participants how they communicate with the other team. Canadian participants’ 
responses are somewhat diverse. One participant discusses communication in a general sense, claiming that, 
“That is on an individual basis, and a need to basis. So if you are working on something that needs to be 
communicated, then we take it upon ourselves to initiate or carry forward that communication.” This 
statement appears to position communication primarily as a need and a function of the job. 

The other Canadian participants discuss communication in more specific terms such as meetings, 
phone calls, and email. Three participants frame communication with Chinese at least partially in terms 
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of meetings, which is a regular form of communication. Two of these participants also mention using phone 
to communicate with the Chinese team. However, a different Canadian participant notes that the 12-hour 
time difference makes it impractical to use phone as a regular communication method between the teams. 
As a result, he/she states that most of his/her communication with Chinese is by email, but this results in 
a one-day delay in receiving a response. 

Only one Canadian participant discusses communication with Chinese in terms of more personal 
communication, such as chatting with the team from home. Interestingly, the same participant comments 
on the use of IM as a tool for communicating with Chinese team members and expected the usefulness of 
video conferencing tool for improving relationships between the teams, explaining that it would be, “So I 
can see them and they can see me. That would keep that relationship going”. 

The responses of the Chinese participants are more homogeneous compared to those from the 
Canadian team. All Chinese participants discussed the specific communication methods used, including 
email, conference calls, and meetings. Their perspectives on these methods are somewhat similar. Emails 
are the main method to communicate with the Canadian team, according to all participants. However, one 
participant also noted that he sometimes made phone calls to the Canadian team in urgent situations. Two 
participants mentioned the database in terms of sharing blue prints. One participant said: “The database 
we use to save blueprints is called the vault. If anyone makes any changes, they need to check in and email 
the team of which prints he made a change.” Another participant mentioned that any change he made 
would be available to the other team. In the interview only one Canadian participant mentioned that he 
made phone calls to the Chinese team. Interestingly this is the same participant who commented that phone 
call is impractical because of the 12-hour difference. Chinese participants’ responses on communication also 
indicate the need-based communication approach. None of the Chinese participants noted that they engaged 
in more personal communication, such as simply chatting, with the Canadian team. 


4.3 Similarities and Differences in Communication Between the Teams 


The Canadian participants were asked to compare communication within the team and with the Chinese 
team. Accessibility is recognized as a significant issue that explains the differences in communication 
between the teams. Participants recognized that the accessibility issue is due to the distance and different 
time zones. Two participants who have managerial roles in the company note the impact of accessibility 
issues on management. For example, one participant explained that although he/she would use the same 
authoritative style, he/she might “..be more careful how that tone [comes] across” especially if he/she is 
doing talking on the phone. For him/her, it is very important not to “insult them or have them shut down”. 
His/her response indicates a focus on equality while still understanding that there are differences between 
the teams both in terms of culture and in terms of how communication happens (for instance, little to no 
face-to-face communication). The other participant commented on how accessibility might have affected 
decision-making processes, noting that, “I think that sometimes with the decisions we tend to have more 
significant decision conversations. The time to have that discussion is quite different. So we might have 
pre-conversations here and involve nobody from [the Chinese team] and then involve them later. Whereas 
when it is here, it is much easier to involve everybody in right away”. The same participant also commented 
that when comparing the communication between the two teams what interests him/her more is not the 
communication method or style but the content of the communication. 

The Chinese participants also talked about the problems with distance. One participant mentioned: 
“Because of the distance, it takes time to explain detailed situations happening in one site to the other 
team”. Another participant said that “Canadian engineers are familiar with people from different 
departments in Canadian, they know each other, work together... Sometimes the message delivered from 
Canadian covers the most important information, or the most important opinions, (but not the whole 
picture).” The Chinese participants also commented that the atmosphere of communication with their local 
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team is more causal and direct, and they believed the same situation applies to the Canadian team when 
communicating with their local members. Although the engineers from both sides share similarities in using 
graphic design software and/or drafts to exchange opinions, participants are aware that not being able to 
communicate face-to-face causes differences in response time and efficiencies. This might explain why all 
five participants from the Chinese team talked about company visits as one of the communication methods 
in their interviews. According to one of the participants, “this type of visit received good feedback from 
both sides”. 


5 Implications 


Overall, the preference for personal face-to-face communication rather than remote technologies is marked 
in both teams. This seems to indicate that technologies that bring intimacy or synchronicity to the 
communication will improve the communication flow and/or experiences between two teams. However, 
there seems to be some reluctance around using particular technologies for communication, even when they 
are available (e.g.: IM, video conferencing, etc.). One participant seems to be keen on using these 
technologies. In contrast, another mentions that although they have increased their IM use, and use it a 
lot now, they do not feel efficient on some technologies. A different participant also raises another concern, 
and explains that although he/she is familiar with IM, his/her use at work, “depends on how much time I 
have. Sometimes you are busy and the message pops up, you know. In IM, people want to talk right away. 
And I don't have time to talk to them.” 

Research literature offers several explanations why teams might not be willing to leverage avail- 
able communication technologies. Thompson and Coovert (2003) observed a negative impact of computer- 
mediated communication on teamwork. Tannenbaum et al. (2012) suggested that despite some technologies 
offer 24/7 connectivity, they are not always perceived as useful for remote teamwork in organizations 
because of the different work hours in different countries. In addition, the use of communication technologies 
has been found to negatively affect the team decision-making process. In Baltes et al.’s (2002) meta analysis 
of studies about computer-mediated communication and group decision making, they found that computer- 
mediated communication leads to decreases in group effectiveness and increases in time compared to face- 
to face groups. Credé and Sniezek (2003) found that compared to face-to-face groups, video-conferencing 
groups showed lower levels of confidence in their decisions. It is worth noting that these studies were 
conducted over a decade ago and were about within-team communications. Since then, groupware 
technologies have advanced rapidly and remote teamwork has become an expected work style in the 
workplace. It would be interesting to revisit these issues in current remote teamwork context. On the other 
hand, our findings indicate that members’ preferences and practices of communicating with other teams 
have not embraced modern groupware technologies. Moreover, our findings suggest that communication 
strategies that would work for within-team work might not be applicable to between-team situation. For 
example, Campbell and Stasser (2006) found that allowing team members ample time to discuss task could 
enhance information sampling and decision quality in computer-mediated groups. However, we found that 
in fast-pacing industrial environments where the two teams operate, time is a very limited resource. Team 
members prefer to allot more time to discussing engineering problems face-to-face with local team members 
rather than communicating their ideas and solutions with the other team. In summary, our observed 
communication practices of two teams suggest that that additional design requirements of groupware 
technologies need to be explored to support between-team work. 

As Yin pointed out (2003), analytical generalizability in case study research is the ability to 
generalize research results to a theory. This allows work conducted with small sample populations to be 
applied to broader theoretical considerations of the phenomenon being studied. Analytical generalizability 
makes it possible to consider how our research fits within and contributes to broader considerations of inter- 
cultural teamwork. When compared with existing research, this work adds to existing theories on inter- 
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cultural teamwork, particularly with respect to inter-cultural and remote communication. Theories suggest 
that inter-cultural communication can result in many communicative challenges, particularly with respect 
to working within a team (Zakaria, Amelinckx, & Wilemon, 2004; Oertig & Buergi, 2006), and research 
indicates that conflicts are more likely occur in teams that are culturally heterogenous (Dunkel & 
Meierewert, 2004). Our research also accounts for these concerns from between-team perspective and 
illustrates some of the challenges experienced by Canadian and Chinese teams with respect to differences 
in national culture and team culture. Furthermore, given participants’ efforts to communicate effectively 
with remote team members, our research also reaffirms the importance of intercultural competence and 
intercultural sensitivity in addressing the challenges of inter-cultural teamwork (Matveev & Milter, 2004; 
Matveev & Nelson, 2004). 


6 Conclusion and Future Work 


With the increasing number of oversea branches in information and knowledge workplace, it is expected 
that decision-making activities that involve members of two teams will become more and more common, 
following the common distributed teamwork processes. However, there are few studies that investigate 
between-team activities hence our understandings of the between-team decision making phenomenon are 
limited. In our study, we collected various kinds of data from two engineering teams of a Canada-based 
company to understand the teams’ characteristics, sensitivity to intercultural communication and 
collaboration, and teams’ own practices in communication, decision-making, and managing conflicts. 

In this research note, we report partial results of our study: we compare and contrast two teams’ 
communication practices and the implications of the practices on the performance of the decision-making 
activities. We next will examine the other aspects of the teams’ practices, i.e., their decision-making and 
conflict management styles, and daily group interaction patterns (e.g., whether the members of two teams 
had similar level of group interactions in work place). Our ultimate goal is to identify the key factors of 
between-team decision-making activities. 
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Abstract 

This exploratory study examines the relationships among cultural values, sources of information about a 
current event, and perceptions of national security. The study uses the case of Edward Snowden and his 
actions in releasing information classified as secret by the U.S. Federal government. The study compares 
the perceptions of survey respondents from India and the U.S. at two times soon after Snowden released 
the information and examines the relationship among cultural values, information sources, and 
perceptions of Snowden and his actions. The cultural dimension follows the Hofstede cultural values 
measures of power distance and individuality, measures in which India and the U.S. exhibit significant 
differences. The survey was conducted using Amazon’s Mechanical Turk (MTURK) to solicit responses 
in July and August 2013. The results reveal that the U.S. and India respondents agree on some aspects 
of the case (e.g., that Snowden is a courageous individual) and do not shift their viewpoints from the 
first survey to the next. However, the respondents differ significantly in their use of information sources 
and report significantly different opinions on the potential impact of Snowden’s actions on national 
security issues. This limited study revealed an unexpected difference from Hofstede’s work in the power 
distance cultural dimension, raising questions about the use of MTURK for cross-cultural studies. 
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1 Background 


The Case: Most people who have access to news sources know something about Edward Snowden and his 
actions in releasing information—classified as secret by the U.S.—that detailed how U.S. intelligence 
programs had been collecting telephone metadata and Internet records on citizen and government 
communications as part of a massive surveillance effort. Snowden, an employee of Booz Allen Hamilton, a 
contractor to the U.S. National Security Agency (NSA), provided information to The Guardian, which 
published the information in a series of articles in the newspaper beginning in June 2013 (Gidda, 2013). 
Snowden released his information while in Hong Kong. As this is written, Snowden is living in Russia, which 
granted him asylum and has resisted extradition requests from the U.S. Moreover, additional information 
on the surveillance program continues to be released through newspaper stories. 

Culture: Hofstede (2001), based on earlier work by Kluckhohn (1961), defines culture as “the collective 
programming of the mind that distinguishes the members of one group or category of people from another” 
(p. 9). In a series of studies of community values, initially in the late 1960s with IBM employees from 
different countries, he identified several dimensions of cultural differences. In our study, we are interested 
in two of these: power distance (PDI), which is a measure of the degree of acceptance of inequality in society 
by its members; and individualism (IND), which is a measure of how loosely members of a society feel tied 
together. Low power distance represents a more egalitarian society than one with high power distance; and 
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low individualism represents a more collectivist society. Hofstede’s indicators are relative measures, not a 
ratio scale. 

In this study, we examine differences and relationships among these two cultural measures for 
responses from India and the US, the information sources used, and perceptions of Snowden and the possible 
consequences of his actions. We collected data at two points in time separated by about five weeks in order 


to assess if there were changes in perceptions in this period. 


2 Method 


We conducted the study by recruiting participants in the U.S. and India using Amazon’s Mechanical Turk 
(often abbreviated as MTURK). Using MTURK has advantages of response time and cost-effectiveness 
compared with other recruitment methods (using students, direct mail, etc.), and the quality of responses 
from MTURK can be relatively high. In one study, only 4.17 percent of respondents failed a quality control 
question, compared with failures of 6.47 percent and 5.26 percent for participants from a university and 
Internet message board, respectively (Paolacci, Chandler, & Ipeirotis, 2010). The cost per usable response 
can be low (less than $1, even for surveys that take 20-40 minutes to complete). The use of crowdsourcing 
for research has increased in popularity and acceptance for these reasons and others (Howe, 2006; Kittur, 
Chi, & Suh, 2008; Mahmoud, Baltrusaitis, & Robinson, 2012). 

In our study, the survey instrument for both time periods included questions about the use of 
different information sources for learning about the Edward Snowden situation. The questions asked about 
the use of 1) Blogs; 2) Online social media discussions; 3) Search engine news; 4) Online news services; 5) 
Television shows; 6) Personal discussions and email exchanges, and 7) Newspapers (including online 
versions). Responses were in the form of an anchored five-point Likert scale (1=not used at all; 2=used 
rarely in one week; 3=used at least weekly; 4=used daily, and 5=used several times per day). Both surveys 
asked for the respondents’ degree of agreement with statements related to security and to how Snowden 
might be viewed (e.g., as publicity-seeker; courageous whistle-blower; etc.). Finally, both surveys included 
demographic questions. We also asked for the respondents’ views of Snowden as a person and views of the 
significance of his actions to personal and national security, again using Lickert response scale. 

The first survey included questions to measure cultural values and other personal measures that 
are not discussed in this research note. The cultural values questions are taken from Hofstede’s Values 
Survey Model 2008 (1984, 2008); we used the 2008 version. 

We included quality control questions intended to assure that the respondent read and understood 
the questions and was not simply providing responses simply to complete the task. We eliminated responses 
that failed these simple quality control questions. 

We had a total of 101 usable responses (respondents completed both surveys and passed the quality 
control questions) from the US and 107 from India. The analysis of these sets of responses forms the basis 
of our findings. 


3 Findings 

The expected alignment of cultural values from our survey with the prior work of Hofstede was mixed. As 
anticipated, we found a significant difference between US and India responses on both PDI and IND. The 
direction of the difference in IND was as expected: the US sample measure score was significantly higher, 
indicating a higher degree of individuality in the US compared with more collectivism in India. However, 
the direction of the difference in PDI was opposite what we expected: the US sample score was higher than 
the India score, with the US indicating less egalitarian/individuality. We say more about this anomaly in 
the discussion section. 
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There were no significant changes in response statistics from the first survey to the second survey, 
either in the use of the information sources or in the agreements with the statements about Snowden and 
the impact of his actions on security. 

We observe significant differences (p < .01) in sample means from the US and India on the 
frequency of use of different information sources (Table 1). The India sample reports more frequent use of 
all sources compared with the US sample. Similarly, on two of the questions about Snowden’s motivation 
(Table 2), we see significant differences between the US and India. Finally, on each of the questions about 
potential impacts on security (Table 3), we find significant differences between the two samples on most of 
the questions, with the India responses generally indicating greater agreement with the statements. 

Two clear exceptions to the differences between the two samples stand out, one related to Snowden’s 
motivation; the other related to the impact on long-term security for the US. The two samples were in 
agreement with the statement, “In my view, Snowden is a courageous individual who followed his 
conscience.” For this statement, both the US and India sample means were almost identical, tending toward 
more agreement than disagreement. Similarly, in response to the statement, “In my view, Snowden’s action 
in the long run will make for a stronger and more secure U.S. society,” there was no significant difference 
in response means. Both samples tended to agree with this statement. 


Rate how much you have used each of the following sources of information to learn 


about Edward J. Snowden, his disclosure of U.S. surveillance activities, and his legal 


situation: 


1=not used at all; 2=used rarely in one week; 3=used at least weekly; 4=used daily; 5=used several 


times per day 


Blogs 


Country Phase 1 Phase 2 
U.S. M=1.95; s=0.999 M=1.81; s=1.003 
India M=2.72; s=1.139 M=2.42; s=1.063 


Ain Means 


t-statistic 


0.762 
4.342** 


0.614 
3.755**+ 


Online social media 


discussions 


Search Engine News 


US. M=2.09; s=1.127 M=1.96; s=0.982 

India M=3.33; s=1.307 M=3.25; s=1.156 
Ain Means 1.236 1.290 
t-statistic 6.294**+ 7.466** 


U.S. 


M=2.40; s=1.196 


M=2.40; s=1.172 


India 


M=3.53; s=1.153 


M=3.58; s=1.002 


Ain Means 


t-statistic 


1.128 
5.869**+ 


1.184 
6.694**+ 


Online News Services 


US. M=2.66; s=1.199 M=2.55; s=1.137 

India M=3.24; s=1.232 M=3.36; s=1.043 
Ain Means 0.584 0.814 
t-statistic 2.959**4 4.665**+ 


Television shows 
(including online TV 
sites) 


U.S. M=2.37; s=1.268 M=2.43; s=1.155 

India M=3.40; s=1.256 M=3.39; s=1.203 
Ain Means 1.035 0.961 
t-statistic 5.044**+ 5.145**+ 


Personal discussions 


and email exchanges 


US. 


M=1.62; s=0.991 


M=1.76; s=1.031 


India 


M=2.72; s=1.265 


M=2.66; s=1.241 
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Ain Means 


1.096 


0.901 


t-statistic 


5.842** 


4.871** 


Newspapers (including 


online versions) 


US. M=2.05; s=1.170 M=2.11; s=0.972 

India M=3.82; s=0.927 M=3.83; s=0.939 
Ain Means 1.772 1.719 
t-statistic 10.450** 11.291**+ 


** Significant at the .01 level (2-tailed) 
-- Not Significant 


* Significant at the .05 level (2-tailed) 


+ Equal variances assumed 


Table 1: Use of Information Sources; Sample Means 


Degree of agreement with statement, 1=strongly disagree; 5=strongly agree 


In my view, Edward 


.. broke the laws of the U.S. 
and thus deserves to be tried 


in court. 


Country Phase 1 Phase 2 
Snowden... 
US. M=3.71; s=1.140 M=3.59; s=1.191 
..is a courageous individual India M=3.72; s=0.997 M=3.71; s=0.941 
who followed his conscience. Ain Means 0.004 0.119 


t-statistic 


US. M=3.03; s=1.234 M=2.99; s=1.202 

India M=3.37; s=1.166 M=3.36; s=1.014 
Ain Means 0.339 0.373 
t-statistic -- 2.085*+ 


..is a publicity seeker and 
hopes for personal gain from 


his actions. 


US. M=2.31; s=1.242 M=2.32; s=1.199 

India M=3.43; s=1.221 M=3.31; s=1.237 
Ain Means 1.122 0.986 
t-statistic 5.601**+ 5.086**+ 


** Significant at the .01 level (2-tailed) 
-- Not Significant 


* Significant at the .05 level (2-tailed) 


+ Equal variances assumed 


Table 2: Views on Snowden and Motivation 


Degree of agreement with statement, 1=strongly disagree; 5=strongly agree 


In my view, Snowden’s actions... 


Country Phase 1 Phase 2 
US. M=2.51; s=1.044 M=2.66; s=1.098 
India M=3.24; s=1.088 M=3.43; s=0.992 
..make me feel personally more secure. : 
Ain Means 0.733 0.779 
t-statistic 4.241** + 4.648**4+ 


..have damaged U.S. national security 


US. M=2.78; s=1.261 M=2.68; s=1.270 

India M=3.36; s=1.215 M=3.26; s=1.208 
Ain Means 0.577 0.583 
t-statistic 2.858**+ 2.952**+ 


..have damaged all democratic nations’ 


security 


US. M=2.15; s=1.153 M=2.30; s=1.159 

India M=3.14; s=1.226 M=2.75; s=1.156 
Ain Means 0.985 0.453 
t-statistic 5.078**+ 2.461*+ 


US. 


M=3.74; s=0.994 


M=3.84; s=1.116 
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..make me less confident in my India M=3.32; s=1.192 M=3.17; s=1.124 
government’s oversight of our nation’s Ain Means -0.417 -0.665 
security. t-statistic 2.303** 3.738**+ 
U.S. M=3.07; s=1.097 M=3.31; s=1.123 
..in the long run will make for a India Meet M=3.54; s=0.867 
stronger and more secure U.S. society es 
Ain Means 0.140 0.224 
t-statistic -- -- 
U.S. M=2.36; s=1.181 M=2.26; s=1.151 
„negatively affects all democratic India M=3.10; s=1.257 M=3.00; s=1.209 
societies, U.S. and others Ain Means 0.748 0.742 
t-statistic 3.790** + 3.955**+ 
US. M=2.94; s=1.103 M=3.00; s=1.000 
..will make little difference in our India M=3.55; s=0.958 M=3.28; s=0.998 
security as a society Ain Means 0.610 0.275 
t-statistic 3.5978 *+ -- 


** Significant at the .01 level (2-tailed) * Significant at the .05 level (2-tailed) 
-- Not Significant + Equal variances assumed 


Table 3: Views on Potential Consequences 


4 Limitations and Discussion 


This exploratory study has all of the well-known limitations of surveys: issues of sampling error, sampling 
bias, measures at a single point in time, etc. The use of MTURK introduces new challenges to these 
limitations, and these have been widely discussed elsewhere (e.g., Chandler, et al 2013). 

The dimensions of culture as articulated by Hofstede have been criticized by many scholars as 
lacking in theoretical foundation and as subject to misuse. Hofstede himself acknowledges that this 
framework is empirical and often misused, and addresses some of these critics and criticisms as part of a 
discussion in Human Relations (see McSweeney, 2002, and Hofstede, 2002). We use the construct as a useful 


framework for comparison and do not seek to ground it in personal or societal theories. 

In this study, we treat some of these limitations (e.g., we control for quality, we take measures at 
two points in time). However, as an exploratory study of differences, we are not seeking to create ratio 
scales or generalize beyond the samples. Consequently, our findings — as in many exploratory studies — 
make no claim to be definitive, but they do stimulate thought and point the way toward additional studies. 

The differences we observe between the US and India sample are provocative and worthy of 
discussion. However, the two statements on which there is agreement (the general tendency to think that 
Snowden was motivated by conscience and the tendency to agree that his actions will have a long-run 
positive impact on US society) may be even more interesting. Do these statements tap into some kind of 
universal attitude or value system? 

In terms of surprise value, the reversal of the anticipated difference in the measure of PDI (power 
distance) surpasses other findings. We find this puzzling and can think of two possible explanations. One 
possibility is that the sample is simply faulty—that our controls to assure thoughtful responses failed and 
the responses are flawed. Another possibility is that MTURK samples are distinct from the general 
population (at least distinct from the population demographics of samples in prior studies using the Hofstede 
dimensions). 

To explore the first possibility, we went back to the responders to the first two surveys, and we 
additionally solicited new responders to the survey to see if the results would change. The results were 
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similar. The PDI differences between the U.S. and India samples remained the same, i.e., the reverse of 
what has been found in other studies. 

This leaves us pondering the second possibility: might the MTURK approach to sampling tend to 
elicit respondents who differ so much from the expected societal values? We’d like to explore this, and other 
possible explanations further in future research. 
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Abstract 

Information sharing among law enforcement officers and between law enforcement officers and the public 
is crucial to creating safe neighborhoods and developing trust between members of society. Since the 
terrorist attacks on the US in 2001 the US government has implemented a program called the information 
sharing environment: for both national security agencies and local law enforcement communication and 
sharing information is a top priority. Human information behavior and human information interaction 
research has been conducted in a variety of environments yet there is little research related to law 
enforcement and the public. This note presents early case study research in to this complex information 
sharing environment. The work builds on the strong tradition of research in information science related 
to information behavior and hopes to bridge the gap between security and law enforcement conceptions 
of information sharing and that of information science. This research is being conducted with the 
collaboration of a major metropolitan police department in the southern United States. The diverse 
research team brings together an academic, a law enforcement consultant and a constable from Toronto, 
Canada. While one deliverable of the project is to provide the law enforcement agency with a strategic 
communication and social media plan; the larger goal is to begin a multiple case research project to 
develop our understanding of information sharing with these types of unique stakeholders and in these 


complex environments. 
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1 Overview 


This paper presents work in an ongoing research project to understand and improve information interaction 
and the information sharing environment between law enforcement and the public. The practical goal of 
this phase of the project is to develop communication strategies and policy that incorporate social media 
tools for a large metropolitan police department. The work is being conducted by a diverse team of 
researchers: an academic, an American law enforcement consultant and for this initial phase, a Canadian 
constable, with expertise in the use of social media to help create safe and successful neighborhood / police 
collaborations. 

The relationships between law enforcement and the public are often contentious. Across the U.S., 
there have been concerns with excessive use of police force upon citizens, lack of community engagement in 
high crime neighborhoods, and frequently a widespread environment of distrust of the police among 
community members. Information sharing in this environment is crucial, yet plagued with a history of 
suspicion by both groups, which creates an extremely complex environment for information interaction. In 
addition to these basic obstacles, law enforcement agencies can be difficult to engage in research. They are 
in a continual battle to justify their actions in carrying out the jobs they’ve been sworn to do; the culture 
of the thin blue line remains strong (Noaks & Wincup, 2004). Social media is the most recent in a long line 
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of techniques used by law enforcement to reach out to the public and bridge some of these gaps. How to 
best use these tools and what kinds of policies need to be in place to structure their use is important. 


2 Conceptual Framework 


Human information interaction and information sharing are key concepts in the field of information science. 
Under the umbrella of human information behavior, Fidel has published numerous articles, and most 
recently a book, discussing the importance of conceptualizing information behavior as interaction (Fidel, 
2012). Understanding how we interact with systems (including each other) is a key to designing better tools 
to help us carry out whatever work we’re tasked to complete. Related research has focused on academics, 
students, people in the healthcare field and janitors (Chatman, 1991; Fidel, Mark Pejtersen, Cleal, & Bruce, 
2004; Pettigrew, Fidel, & Bruce, 2001; Savolainen, 2009; Solomon, 1997; Diane H Sonnenwald & Iivonen, 
1999; Thomas D Wilson, 2000). One group, however that has not been closely studied is law enforcement. 
In a military context, Sonnenwald and Pierce (2000) looked at information behavior in military command 
and control situations and much of her work can be applied to law enforcement because of the strong 
hierarchical nature of information exchange in these environments. This research adds to the robust 
literature in information behavior and information interaction by bringing the complex work environment 
of law enforcement to light. Through the case study method (Fidel, 1984) and a focus on information 
sharing, be it for intelligence gathering or community engagement via community policing. This research 
also draws on law enforcement and national security discourses of the information sharing environment 
which took on a new role following the terrorist attacks in the U.S. on 9/11. 


2.1 Community Policing 


Community policing is an initiative that first became popular in the United States in the 1970s. The goal 
and logic behind the concept is to increase community awareness of neighborhood safety and crime issues. 
Law enforcement needs community engagement to effectively carry out their jobs and through active 
engagement with the public, law enforcement has hoped to reduce crime and engender trust between the 
public and police. Features of community policing include: neighborhood watch, Night Out events, law 
enforcement conducting foot and bicycle patrols in communities which are aimed to increase face recognition 
and hopefully trust. 

Community policing has been met with a fair amount of criticism and early research suggested that 
these efforts did little to reduce crime in neighborhoods and in fact may have resulted in merely moving 
the criminal element and behavior to another area of the city (Fridell & Wycoff, 2004; Kerley & Benson, 
2000; Marx, 1989; Marx & Archer, 1971; Rosenbaum, Graziano, Stephens, & Schuck, 2011; Sung, 2001). 
More recently, critics have asserted that community policing is actually a result of the “retreat of the police 
and the public diffusion of surveillance responsibilities” (Reeves, 2012, p. 238). 

There are three key elements that must be in place for community policing to be successful. First 
is an atmosphere of trust between the public and law enforcement, where the public feels that police officers 
are there to protect them from harm. Second, law enforcement must be transparent where the public feels 
that the police are honest and fair in enforcing the law. Third is a willingness on both sides to engage in 
dialogue where the public feels safe enough to share vital information as an engaged collaborative community 
member (Fridell & Wycoff, 2004). This includes a willingness on the side of the public to share information, 
including crime tips with the police. This willingness is thwarted by demonizing these individuals as 
“snitches”, who Brown (2007) characterizes as cooperators and informants of crime in a corrupt criminal 
justice that enforces disloyalty as a means to negotiate lesser convictions resulting in less jail time. “No 
Snitching” campaigns have taken off as hip and cool, however, because the street /community-based 
repercussions against people who are willing to cooperate with the police are often grave. For example, in 
the early 2012, a Philadelphia woman was murdered after she went to the police and identified the thief 
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who burglarized her store (Stamm & Chang, 2012). This kind of incident is not an isolated occurrence and 
even if police do develop relationships of trust in communities, that trust is often fragile and rarely, if ever 
“enough”, to effectively and transparently assure the safety of the public. 

Social media is the newest tool law enforcement is adopting in the quest to communicate and engage 
with the public (Lam, 2013). Most commonly law enforcement agencies are using social media; such as 
Facebook, Twitter and YouTube as a means to “tell their own story” rather than relying on the media. As 
a result, the content shared is primarily related to reporting crimes and apprehensions, but also information 
about community outreach events. While each of these are important, one of the most interesting 
affordances of social media is the opportunity they provide to help individuals engage with each other. 
Using these tools to primarily “push” information fails to take advantage of their greatest strengths. 


Community policing must involve more than information provision. 


2.2 Information Sharing 


Information sharing is critical for law enforcement, both internally among fellow officers and externally with 
the public. As noted by Pilerot, in her examination of the concept and practice of information sharing, 
information sharing is context dependent (Pilerot, 2012). It is important to note not only the context in 
which sharing occurs but also what is shared. As a distinct area of research, information sharing has received 
relatively little attention. TD Wilson (T. D. Wilson, 2010) conducted an extensive review of the extant 
literature on information sharing and found examples from information science, healthcare, management 
and information systems fields predominate. He notes a need for increased research in this area. Two related 
fields that Wilson does not address, law enforcement or security, have an extensive literature related to 
information sharing. As stated above, much of this work has been done since the terrorist attacks in the 
United States in 2001. The attacks were seen, in part, as a failure of the information sharing environment 
(National Commission on Terrorist Attacks upon the United States, 2004). 


2.3 Technology: Social Media 


Social media describes a collection of applications that can be used to share information in an online 
environment. Though dialogue is not a necessary part of the definition of social media, an implicit aim is 
to create a forum of not only information sharing but also, information exchange. These tools also allow 
users to share information about their own networks (boyd & Ellison, 2007). Some of the most well-known 
social media applications today include: Facebook, Twitter, and YouTube. Other popular social media 
platforms include Vine, Foursquare, and diverse photo sharing applications like Instagram and Flickr. 

The popularity of these applications is based in part on the expanded use of smart phone technology. 
A recent PEW report stated that over 53% of adult Americans now own smartphones (Smith, 2013). For 
the many cell phone users, posting information concerning their daily activities, articles they may find 
interesting, or even making plans with friends, has become linked through cell phone use. A text message 
can be distributed to Facebook and Twitter with relative ease. In fact, the geo-locational feature that is 
activated by default of smart phones can be used to check-in or connect with the locational application, 
Foursquare. All of these applications are becoming increasingly intertwined as evidenced by the ever more 
frequent prompt that one can log in to certain applications using one’s Facebook log in information. Social 
media provides us with the tools to broadcast our interests and proclivities widely. 
It is clear that social media can be used in a dizzying variety of ways ranging from simple posting of 
information, links, photos, audio files, etc. to ongoing and complex exchanges of information. Thus, in turn, 
social media can also be used as an investigative tool. Common cases of this are recounted by employers or 
job seekers who have used social media to get more information about an organization or an individual. 
Law enforcement also uses social media to identify criminal behavior. For example, after the Stanley Cup 
riots in Vancouver B.C. in 2010, law enforcement was able to identify many of the perpetrators through 
the posts they made on Facebook (Trottier, 2012). 
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Social media tools are becoming a common feature in law enforcement agencies’ communication 
practices. Little research has been conducted on the adoption and use of social media by law enforcement, 
though it has become recognized as an effective tool (Heverin & Zach, 2010, 2011). Earlier adopters of social 
media were new recruits or officers in the lower ranks of police forces. These individuals began using 
applications like Twitter and Foursquare to connect with other police and members of the community. 
Often these messages were alerts about traffic problems or road closures. At other times officers tweeted 
about events they were participating in with members of the community and at times officers would simple 
“check in” via Four Square with members of the public they encountered on their beat. Their superiors 
rarely approved of this police / public engagement at first glance. Social media policies across the board 
reflect a concern with the potential that members of the force my say something that could be considered 
unethical or shed a wrong light on the force. These policies mirror earlier policies that related to email 
usage. Social media is also being used for intelligence gathering (Hays, 2012; Wyllie, 2013). 

Perhaps one of the most published methods for employing social media in law enforcement is simply 
officers searching public social media accounts for Twitter hash tags, image taglines or Facebook postings 
that exalt criminal activity. Many individuals who commit crimes are eager to publicize their exploits 
(Bindley, 2013) is only one of many examples. While it may seem surprising that these individuals aren’t 
aware of the fact that their posts are public, or easily accessed, this tendency has become a primary means 
of intelligence gathering (Trottier, 2012; Wyllie, 2013). A somewhat more interactive method is 
accomplished by creating multiple identities on different social media sites and attempting to be “friended” 


either by a direct suspect or someone within the social network of criminals. 


3 Method 


This is both a formative analysis as well as action research. We are examining the communication patterns 
of the Dallas Police Department and will introduce ways for enhancing communication both within the 
department itself and between law enforcement and the public using social media. Information and 
communication technologies (ICT) are critical to this process and techniques for better use of social media 
(Twitter, Facebook, YouTube, etc), traditional methods such as email and even face to face communication 
are necessary to increase information sharing and trust between all constituencies. Our methodological 
approach is primarily qualitative using the case study method and exemplified by focus groups and 
interviews (Fidel, 1984). In addition, we are also using surveys to get a quantitative baseline of the 
organizational and communication practices, including awareness of and use of social media by law 
enforcement internally and externally with law enforcement and the public. 


3.1 The Site 


The Dallas, Texas police department (DPD) has over 3,400 sworn officers and serves a population of over 
1.2 million people. The metropolitan area is divided into five sections: North central, Central, Northwest, 
Southwest, Northeast and Southeast. Each division is organized in a similar fashion and includes a 
“Community Engagement Unit.” This Unit works as a liaison between the police department and the 
community. The officers assigned to this group work to develop a strong communication relationship with 
citizens within each division. They work with citizens to solve quality-of-life issues and educate the 
community about programs being offered through the department. 

This project will be conducted in four phases: 1) initial interviews and focus groups; 2) survey 
design and launch; 3) additional focus groups and interviews; 4) communication strategy and policy. Data 
analysis will be iterative and inform each phase of the project (Strauss & Corbin, 1998). We have completed 
phase one and are currently starting phase two. We have conducted 5 focus groups and 5 introductory 
interviews. Police culture follows a rigorous chain of command so our initial contacts were arranged through 
the Lieutenant for Media Relations with support from the DPD Chief of Police. The Lieutenant introduced 
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our team prior to each focus group and then left the room. One question on the survey asks if the individual 
would be willing to participate in a telephone interview. If so, they will be directed to a secure sign-up page. 
This information will be kept completely anonymous. The primary purposes of these meetings were two- 
fold: first, to learn about the department in a broad sense and second, to gather information to better design 
the surveys that we are now distributing to both the department internally and externally to the 


community. 


3.2 Focus Groups 


As stated above, we have conducted a total of five focus groups. One group consisted of citizens living in 
the North central division of the DPD. As the research progresses we will hold additional focus groups with 
citizens in each division. Our goal with the first group was to get a general idea of the issues participants 
felt were important, their satisfaction with communication channels between themselves (the groups they 
belong to) and officers assigned to their division. We also asked them to discuss the current tools they use 
to find information about their communities. 

Within the DPD we conducted four additional focus groups: one with union representatives, one 
with civilian employees working in the department and two with a mix of sworn officers from different 
divisions. The purpose of these meetings was to provide information about our research to representatives 
throughout the department and to get a general idea about the communication environment within the 
department. This included asking about the tools they use to communicate with others, their general 
satisfaction with interdepartmental communication, and obstacles they feel keep them from sharing or 
receiving the information they need to carry out their jobs. 


3.3 Individual Interviews 


We also had the opportunity to conduct one-on-one interviews with two sworn officers from the Dallas 
Police Department’s Gang Unit - which is responsible for documenting and tracking gang activity within 
the city; the Deputy Chief overseeing Patrol; the Assistant Chief in the administrative and support bureau 
and the director of a Division working with the homeless and mentally ill. As with the focus groups the 
structure of these interviews was open. Our goal was get a sense of the current communication environment, 
the level of satisfaction each had with the information interaction and to introduce our own goals for 


developing strategy and policy. 


4 Next Steps 


Field notes from the initial focus groups and interviews are currently being analyzed and links to the surveys 
have been sent to the Dallas Police Department Media Relations division. The link to the survey will be 
sent to officers as well as community groups throughout the city. We will also advertise the availability of 
the surveys in local libraries and community centers. We intend on closing the surveys 6 weeks after their 
release. At that point we will compile the data using the features available through SurveyMonkey. Any 
responses we receive for telephone interviews will be addressed as they come to us. We will conduct 
interviews, transcribe and then code the interviews. By the end of the year we will present the Dallas Police 
Department with a communications strategy and social media policy. We hope to follow up with the 
department directly. 

This phase of the research is product based. Throughout this process we will be developing a theory 
of information interaction as it relates to law enforcement and the public. As mentioned earlier in this note 
this area of information interaction research has received little attention from the Information Science 
community. We hope that our continued work and future case studies will lead to an understanding of this 


aspect of information behavior and inform future policy in law enforcement and national security. 
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Information sharing within law enforcement agencies and between law enforcement officers and the public 
is complex. Between the public and law enforcement power imbalances, suspicion and distrust make the 
process of sharing information even more challenging and the militaristic chain of command mentality 
makes breaking out of traditional communication and sharing patterns a potentially insubordinate act. This 
note is intended to serve as an introduction to an ongoing, multiple site case study with the end goal of 
creating a framework for including the complex environment of law enforcement and security to information 
interaction scholarship. 
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Abstract 

This paper argues that theorizing computer-mediated communications as political engagement within a 
sociomaterial perspective, an understanding attentive to the mutual constitution of the social and 
material, allows researchers to conceptually analyze the unique political practices afforded by information 
and communication technologies and their function within a public sphere. This approach foregrounds 
the mutual constitution of sociomaterial practices, and recognizes the centrality of performativity and 
contextual multidimensionality in their constitution and analysis. In addition, articulating patterns of 
these practices as sociotechnical systems presents a framework for scaling local analyses toward increasing 
levels of analysis commensurate with public sphere theory. 
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1 Introduction 
manchester8117 
This comment has received too many negative votes 


As a gun owner, I'm appalled at how easy it is to get ammo on the spot. I have never said on a 
whim... "I need a couple of hundred bullets". I know how many are in the safe, How many I need. 
When I'm going hunting. Waiting a few days is no big deal. Knowing that a law may stop someone 
from stockpiling makes me sleep better at night. Having a gun is a right but right's have limits. 


matthaus ayers 


@manchester8117 your an idiot. I think your right to free speech should be limited to silence. How 
about that for limits? Thats the same as castrating my ak to 7 rounds. You can defend yourself 
from 2 mab 3 attackers with seven 7.62x39 rounds. You live in a whimsical land of fairytales where 
you think bad people are going to follow these asinine laws. WRONG! Whens the last time a 
criminal followed the law? Oh wait...thats not their cup of tea.! 


You most likely never read these comments. They were posted on YouTube in wake of the New York Secure 
Ammunition and Firearms (SAFE) Act which was signed into law on January 15, 2013. You never read 
them because they are but two among tens of thousands in the weeks following the legislation effectively 
banning firearms classified as assault rifles, and limiting magazine size on all weapons within the state of 
New York to seven rounds. They are two among the millions of comments posted each day on YouTube 
and across the web. It is understandable you did not read them, in fact, why would you? 

This paper begins an answer to this question in a particular way. What, if anything, can be learned 
from this reading, and how should one read online civic discourse? More precisely, as researchers, what 
theoretical approaches disclose computer-mediated communication (CMC) as a distinct technologically- 


1 http://www.youtube.com/watch?v=LZ1W45Aq7Ge 
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mediated form of political engagement, and how can a particular digital text be read beyond the context of 
its own enactment? 

These questions extend from an ongoing analysis of CMC of which the above comments are a part. 
We pursue here a theoretical approach for the analysis of digital public spheres in which such analyses may 
be more rigorously undertaken. We hope such a theoretical discussion will open avenues toward more 
effective analyses of political engagement online. This discussion bridges social science theorizations of the 
public sphere with recent scholarship in Information Science (IS) that offers a more robust approach to the 
analysis of CMC through their theorization as sociomaterial practices. An attempt will be made to redress 
the lack of theoretical crossover from IS to outside social science research recently identified by Sawyer & 
Jarrahi (2013). 

Thus this paper argues that theorizing CMC as political engagement within a sociomaterial 
perspective, an understanding attentive to the mutual constitution of the social and material, allows 
researchers to conceptually analyze the unique political practices afforded by information and 
communication technologies (ICT) and their function within a public sphere. 

Before proceeding, a brief clarification of the concept of “public sphere” must be made. Conceived 
originally “as the sphere of private people come together as a public,” Jurgen Habermas’s foundational work 
has been the source of extreme influence and criticism (Habermas, 2008/1962, p. 27). Calhoun (1992) 
alternatively describes the public sphere as “a socially organized field, with characteristic lines of division, 
relationships of force, and other constitutive features” (p. 38). The public sphere can thus be conceived of 
as a network of public discourse: “a field of discursive connections” within which “there will be clusters of 
relatively greater density of communication within the looser overall field” (p. 36). 

With this concept of the public sphere in mind, this paper will explore a theoretical approach more 
suitable for the empirical analysis of its constituent elements. Turning first to a problematization of the 
theoretical foundation for much of CMC analyses, the importance of adopting a sociomaterial perspective 
will be then discussed. The central aspects of this perspective will next be described in consideration of their 
affordances for further research. 


2  Problematizing Current Theoretical Approaches 


Returning to his original thesis in The Structural Transformation of the Public Sphere, Habermas (1992), 
weighing thirty years of socio-political change, chooses to close his discussion with a curiously open 
conclusion: 


“Thus if today I made another attempt to analyze the structural transformation of the public 
sphere, I am not sure what its outcome would be for a theory of democracy- maybe one that could 
give cause for a less pessimistic assessment and for an outlook going beyond the formulation of 
merely defiant postulates” (p. 456-7). 


His speculative hope, though slight, follows from the emergence (between 1962 and 1992, the dates of the 
original publication and the quoted retrospective, respectively) of an “electronic mass media” that, although 
still considered as reifying civic communication toward commercial and administrative logics, allows at least 
an “ambivalent” democratic potential (Habermas, 1962/2008, p. 163-9; 1992, p. 457). The subsequent 
development of the internet and digital media expand the possibilities underlying this ambivalence yet, with 
slight elaboration, Habermas never reevaluated the communicative preconditions for a concept of the public 
sphere(s) despite the complex changes involving ICTs at all levels of social integration and reproduction 
(Habermas, 2006). 

Nevertheless, the Habermasian model of communicative rationality, predicated on normative 
standards of formal discourse, has exerted far-reaching influence on CMC research and its contribution to 
the theorization of the public sphere (Lunt & Livingstone, 2013). Although this framework has often been 
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critically interrogated as counterfactual to the character of online discourse (Dahlberg, 2001; Wiklund, 
2006), exclusionary and uncritical of power asymmetries (Fraser, 1992), privileging a particular form of 
rational communication (Dahlgren, 2002; Chouliaraki, 2013), or as insensitive to socio-historical context 
(Susen, 2011), relatively little attention has been accorded to the materiality of ICTs in empirical analyses. 
Although theoretical discussions routinely examine “affordances” of digital ICTs, these analyses homogenize 
technological artifacts and their contextual functions, foregrounding social processes and institutions 
without sufficient attention to their material constitution. Attention to materiality as an analytical concept 
is not to argue for a materialist epistemology. On the contrary, insufficient attention has been given to the 
social construction of artifacts, and the constitutive interaction between the social and material as enacted 
within sociomaterial practices (Leonardi, 2012). 

The importance of adopting a sociomaterial perspective within political analyses of CMC is twofold. 
First, attending to the materiality of ICT enactments recovers the emergent capacities of modern digitally 
networked society (Castells, 2007; 2008; Latour, 2011). Dahlgren (2005) outlines a general destabilization 
of traditional mass media with the dispersion of counterpublics across an increasingly fragmented media 
topography. These are conceptualized according to an analytical framework of structure, representation, 
and interaction (p. 148). Typical of many theoretical discussions, Dahlgren’s work acknowledges a 
qualitative break with traditional mediatic relations within society, yet offers their theorization only at a 
high level, lacking analytical concepts necessary for articulating specific sociomaterial practices of 
representation and interaction within wider structural contexts. 

Couldry’s (2008) distinction between the concepts of mediatization and mediality provides useful 
orientation in this respect. Whereas conceptualizing the public sphere with respect to mediatization would 
understand a particular media technology and its “media logic” as transforming an entire field of socio- 
political relations, mediality opens awareness to “specific questions about the role of media in the 
transformation of action in specific sites, on specific scales and in specific locales” (p. 380). Mediality, in 
short, orients research toward the negotiations between media production, consumption, and reception at 
the level of localized practices (Couldry, 2004). A sociomaterial perspective addresses the dual constitution 
of these practices and their relation to a digitalized public sphere. 

Focusing attention to localized practices necessitates a theoretical framework in which these may be 
articulated as part of wider, political processes. Calhoun (1992) notes that Habermas originally neglects any 
discussion of the internal organization of the public sphere, an omission representative of the break between 
subsequent high-level theoretical articulations of the public sphere and empirically driven discussions of 
specific CMC use-scenarios (p. 38). This break is accentuated by the multidisciplinary nature of research 
in this area: analyses extending from political science, sociology, and communications studies often neglect 
the material aspects of sociomaterial practices taken up more extensively by CMC and information science 
research. Thus, the second importance of theorizing the materiality of CMC regards, as we propose to do 
here, the capacity for vertically theorizing ICT enactments from particular use-scenarios toward increasing 
levels of interaction within wider sociotechnical systems. The following sections will now begin to articulate 
the adoption of this perspective. 


3 The Sociomaterial Perspective 


Conceptualizing the public sphere as a field of discursive connections necessitates a theoretical approach 
sensitive to digital ICT practices and their relationship within expansive communicative networks. As 
discussed earlier, a sociomaterial perspective is recommended as a means of theorizing the particular 
character of political engagement through CMC, and of analyzing these practices beyond their particular 
contextual enactments. A sociomaterial perspective regards the mutual constitution of the social and 
material within enacted practices, and consists of three interrelated aspects: mutual constitution, 
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performativity, and contextual multidimensionality (Parmiggiani & Mikalsen, 2013; Sawyer & Jarrahi, 
2013). 


3.1 Mutual Constitution 


Claiming CMC as both socially and materially constituted effects an analytical orientation acknowledging 
the social construction of material artifacts enacted in human practice, and that this socially-enacted 
materiality is a constitutive agent within these practices (Orlikowski & Scott, 2008; Leonardi, 2012; 
Parmiggiani & Mikalsen, 2013). The mutual constitution of artifacts thus opens up the black box of 
‘technology” to analysis and critique. 

Taking as example the ongoing research of YouTube discourse alluded to above, the act of 
commenting, producing electronic texts such as those in the preface, must be theorized as a sociomaterial 
practice in order to incorporate the qualities of the particular digital medium within analysis. As such, we 
“read” these comments differently. Pulling away from the presupposition of face-to-face dialog and returning 
to the asynchronous, digitally mediated practice at hand also removes us from assumed concepts such as 
author, audience, and community as context. As sociomaterial practice, commenting necessarily opens these 
concepts toward interpretations consistent with the specific material medium, and beyond those presumed 
within embodied intersubjective interaction. For example, as commenting on YouTube hyperlinks the 
commented upon media to an individual’s networked profile, empirical analysis could, for example, examine 
the distribution of media across YouTube subscription networks as a result of the expansion of networked 
media content embedded within the sociomaterial practice of commenting. Thus attention to the mutual 
constitution of CMC practices opens analysis to the unique affordances of ICT that subvert traditional 
presuppositions of discourse interaction. 


3.2 Performativity 


According to Orlikowski & Scott (2008), “the notion of performativity draws attention to how relations and 
boundaries between humans and technologies are not pre-given or fixed, but enacted in practice (p. 462). 
The latter example of YouTube commenting illustrates the centrality of performativity. Conceptualizing 
the practice of commenting entails an epistemic demarcation whereby a specific, contextualized performance 
determines an empirical set of entities for analysis. Leonardi (2012) thus explains practice “as the space in 
which social and material agencies are imbricated with each other and, through their distinct forms of 
imbrication, produce those empirically observable entities we call “technologies” (p. 38). 


3.3 Contextual multidimensionality 


Contextual multidimensionality constitutes the final aspect of a sociomaterial perspective. This aspect 
reiterates the contextual embeddedness of all sociomaterial practices, while remaining aware of the 
instability of these bounded contexts (Parmiggiani & Mikalsen, 2013; Sawyer & Jarrahi, 2013). 
Sociomaterial analysis revealing the dynamic spatio-temporal relationships of a practice extends awareness 
toward more expansive and complex contextual strata. 

This internal movement toward greater contextual structures presents an affordance of a 
sociomaterial perspective in the analysis of the public sphere(s). Particularly relevant when contrasted to 
the normative theory of Habermas, the contextual multidimensionality of a sociomaterial perspective defers 
to dynamic and scalable empirical investigation for the description and analysis of networked structures. 

Toward this end, Leonardi’s (2012) articulation of an integrated and hierarchical sociomaterial 
framework provides a promising resource. The previous description of practice as the space of a particular 
sociomaterial imbrication presents a concept suitable for low level theorization of “localized experiences 
around a particular or various technologies” (p. 41). As analysis implicates practices within more expansive 
patterns of organization, they may then be conceptualized as constituting a “sociotechnical system.” These 
concepts provide a framework by which empirically explored sociomaterial practices may be vertically 
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integrated in order to describe the internal organization of networked public spheres. This framework 


provides an avenue for bridging IS and CMC research with high-level socio-political analyses, and, thereby, 


addresses the deficit in rigorous empirical analysis of digital ICT in the latter. 


ees LA Contextual 
Mutual Constitution Performativity ae d ; 
Multidimensionality 
Structure Interacting Dynamic structural Structure not fixed but 
sociomaterial configurations of spatio-temporally dynamic 
practices constituting enacted sociomaterial 
sociotechnical systems practices 
Representation Representation as Texts as sociomaterial Conceptual instability of 
both semantic and practices of graphic author, audience, and 
material datum inscription interpretive context 
Interactivity Entanglement and Interaction constitutive Interaction always involves 


mutual shaping of 
social and material 


agencies 


of subjects /artifacts 


particular sociomaterial 
practices embedded within 
localized contexts 


Table 1: Sociomaterial perspective of digital public spheres (Theoretical approach for conceptualizing social 
and material practices as mutually constituting digital public spheres) 


4 Discussion 


Extending this framework toward contemporary IS research, we will now offer conclusions supported by the 
insights drawn throughout this discussion as they address and modify traditional notions of CMC research 
and the empirical analysis of the public sphere. 

Following a sociomaterial approach, CMC research may begin assessing the structure of public 
spheres through a process of integrative analysis. First identifying local sociomaterial practices, these may 
then be analyzed according to their mutual interaction with the aim of delimiting performative patterns of 
interaction. Conceptualizing these assemblages as sociotechnical systems, research could thus begin 
articulating the internal structure of digitally networked public spheres. 

Representation within the public sphere finds increasingly fragmented channels for media 
consumption, production, and distribution. Though these problematize traditional notions of author, 
audience, and producer /consumer, sociomaterial approaches return high-level theorizations toward localized 
practices of inscription and reveal often overlooked features of their enactment. Opening analysis of 
representation to both linguistic and nonlinguistic information flows within concrete material networks re- 
orients empirical analysis away from formal theories of deliberative discourse and toward enacted 
representational practice. 

The notion of practice posits a constitutive interaction between social and material elements. Such 
an orientation refuses to neglect the materiality of CMC interactions, and focuses analysis not only on 
intersubjective relationships but on the recursive shaping of material and social agents as they are enacted 
within sociomaterial practices and, by extension, sociotechnical systems. The public sphere thereby exists 


only as a sociomaterial network, and thus requires research designs sensitive to its empirical analysis. 


5 Conclusion 


In returning to the questions posed at the outset what can now be said in answer? Though answers resist 
in their particular constitution, guidance can be offered in the way we approach their formulation. 
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Addressing CMC in light of the public sphere poses at least two central challenges: How do we adequately 
theorize ICTs? How do we integrate local empirical analysis within a discussion of general networked 
structures? 

Attempt has been made to recommend a sociomaterial perspective as answer to these questions. 
This approach foregrounds the mutual constitution of sociomaterial practices, and recognizes the centrality 
of performativity and contextual multidimensionality in their constitution and analysis. In addition, 
articulating patterns of these practices as sociotechnical systems presents a framework for scaling local 
analyses toward increasing levels of analysis, providing the conditions by which we can, perhaps, more 
effectively analyze CMC toward an understanding of today’s digitally shifting public spheres. 
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Abstract 

This paper introduces innovative techniques for conducting research in virtual worlds. We analyze two 
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1 Introduction 


1.1 Purpose and Scope 


Conducting meaningful online information behavior research presents researchers with both opportunities 
and challenges in their research design, data collection, and informed consent procedures. Yet, such research 
becomes increasingly important as users experience blended offline/online lives. We leveraged the 
programmability of Second Life, a 3D social virtual world to conduct design research into new informed 
consent and data collection methods to study information behavior (Marino, Karlova, Lin, & Eisenberg, 
2012). This paper describes how these methods were developed via an iterative design process, and the 


insights they shed on information behavior in programmable 3D virtual worlds. 


2 Motivation 


Phase 1 of our project focused on gathering rich qualitative data (e.g., interviews, participant observation, 
etc.) to better understand information use in virtual worlds. In the process, we designed and implemented 
a novel method of gathering informed consent (Lin, Marino, & Eisenberg, 2010; Marino, Lin, Karlova, & 
Eisenberg, 2010) and developed understandings of the nature of long-term, continuous use in online 
communities (Lin, Karlova, Marino, & Eisenberg, 2012). Questions remained, however, about how users 
solved information problems, such as how users select partners to work with on projects and events and 
how users organize information in three-dimensional spaces. We initially planned a series of small-scale 
experiments to individually probe these questions and others. We were challenged, however, by the 
limitations and costs of this plan. Instead, insights from Phase 1 lead us to design and implement the Future 
InfoExpo (Future of Information Seeking and Services Exposition) (Marino, Lin, Karlova, & Eisenberg, 
2012). 

The Future InfoExpo consisted of 6 exhibits, similar to booths at an expo; each exhibit offered 
residents a unique opportunity to play with novel and alternative methods of interacting with information 
inside Second Life (SL). Residents could experience all or none of the exhibits, in any order. The full report 
of the iterative design process and implementation of the Future InfoExpo will be detailed in a future 
publication. In this paper, we discuss the iterative design process of the informed consent procedure and 


the data collection utility, so as to support other virtual world and online researchers. 
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3 Related Work 


3.1 Online Information Behavior 


Information seeking, search, and other information behaviors in online settings have been the focus of a 
multitude of studies through the last several decades (e.g., Bates, 1989; Bilal & Kirby, 2002; Fidel et al., 
1999; Head & Eisenberg, 2010). This rich terrain has also prompted new protocols for the design of research 
in online environments (e.g., Hill, 1999; Ju, 2007; Swanson, 2005). In particular, the accessibility and growth 
of virtual environments have revealed new ways in which people access, evaluate, use, and share information 
(e.g., Nowak & Rauh, 2005; Ostrander, 2008; Rieh, 2002; Yee & Bailenson, 2007). Few studies, however, 
have taken a comprehensive approach to how people interact with information for the purpose of solving 
problems in these environments. Moreover, little is known about why people will use many tools, or 


transition among many environments, in order to complete even simple tasks. 


3.2 Virtual Worlds as Information Systems 


Virtual environments hold promise that information problem situations will be supported by an environment 
in which information and communication systems are seamlessly integrated (D'Agustino, 2013). Wasko, et 
al. (2011) observed that virtual worlds are, “starting to hit the mainstream with potentially transformational 
technologies,” (p. 654). They noted that, for example, 10-year-olds are more interested in the avatar 
experience in virtual environments (such as in the Disney-owned Club Penguin™) than in what other people 
are doing on social networking sites. In time, these users, in addition to a video gaming generation, will 
likely force changes in socializing and working, such that, “the borders between work, play, and learning 
dissolved or at least be reshaped” (p. 646). Such changes are already taking hold, as Livingstone (2011) 
recently noted: “ ... the educational use of Second Life has quietly, slowly, and gradually developed and 
grown — seemingly impervious to the media din” (p.62). 

Further, Wasko, et al. argue that virtual world research will influence, “how perspectives around 
the design of information systems need to adapt and change to account for the flexibility and variability in 
virtual world environments” (p. 646). Virtual world research offers creative and innovative insights into the 
design of, not only virtual worlds, but other information systems as well. Chaturvedi, Dolk & Drnevich 
(2011) conclude that VWs form a new type of information system, a type that is not yet accurately described 
by current information system design theories, but will become increasingly integral to a comprehensive 
conception of information systems. 


3.3 Virtual Worlds as Immersive Technologies 


The Future InfoExpo starts from the premise that 3D, immersive, social VWs like SL have not yet reached 
maturity, and that their affordances are not well explored, experienced, understood, or communicated 
(Bessiere, Ellis, & Kellogg, 2009). SL and similar environments have the potential to leverage their 
immersive, 3D, virtual, and social qualities and become a valued and preferred medium for information 
problem-solving for specific information seeking, use, and communication activities (Bainbridge, 2007). In 
addition, various virtual capabilities (avatars, 3D visuals, immersion, interactivity, movement in virtual 
space, for examples) will become commonplace in computer, communication, and recreational systems 
(Wasko et al., 2011). SL does represent a 3D, immersive, social VW with rich potential as an information 
problem-solving setting due to: 1) emerging patterns of use and expectations; 2) unique, immersive, and 3D 
capabilities; and 3) integration of cutting edge technologies. 
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4 Methods 


4.1 Rationale 


The Future InfoExpo was based on a design methodology to simultaneously demonstrate capabilities and 
evaluate improvements in support of diverse information practices through a particular technology — Second 
Life. The design-thinking approach also saved time in identifying, from users’ evaluations, those specific 
design features requiring revision or worthy of further development. A similar approach has been used in 
the design of an environment for those affected by post-traumatic stress disorder (Moore, 1995). We 
struggled, however, with finding appropriate procedures or tools to help mitigate the challenge of obtaining 
informed consent in an online space and of collecting data from so many participants simultaneously. Thus, 
over a few months, we iterated and prototyped numerous design options before arriving at Hugo and the 
HUD. 132 participants interacted with Hugo and the HUD. 


4.2 Automated informed consent via consent bot, Hugo 


Conducting research in virtual environments can be challenging when it comes to addressing the often- 
conflicting complexities of institutional review boards, corporate owners of the virtual environment, and 
users’ expectations. However, the programmable nature of some VWs enables the creation of features that 
can meet these requirements. To enter any of the exhibits, participants first proceeded through the Future 
InfoExpo Entrance. This served as the general welcome area, providing participants with textual 
information about the Future InfoExpo and about the informed consent process, required by our 
institution’s internal review board. Seated at a desk in this welcome area, the consent bot was presented 
visually to participants as looking like a robot. Hugo served as a ‘consent bot,’ an automated system 
facilitating the informed consent process. 

This system was originally designed and implemented in Phase 1 of our project. During that process, 
we sought to increase the transparency of our research activities. We wanted to ensure that all potential 
participants were offered fair and equitable opportunities to engage with the informed consent process. The 
consent bot in Phase 1 was a stationary object that automatically detected the presence of an avatar within 
20 meters of its location. This ‘consent bot 1.0’ informed these incoming avatars of the researchers’ presence, 
research objectives, research activities, and offered users the ability to accept or decline participation at 
that time. Similarly, Hugo, our Future InfoExpo consent bot, also provided these functions. 

For the Future InfoExpo, however, we iterated several modifications in our adaptation of the earlier 
consent bot. The most noticeable change was that we created a robot-like visual appearance for the bot. 
Prior feedback from SL users suggested that this visual presentation, instead of a human-looking avatar, 
would invite a formalized, business-like interaction with participants, rather than a chatty, highly socialized 
interaction. It’s unclear whether users expected a live human being behind a human-looking avatar, or if 
they expected a technologically sophisticated Artificial Intelligence, similar to IBM’s Watson computer. 
While we did not aim to collect data about participants’ interactions with Hugo, many participants 
commented to us via in-world live text chat. Because we were not expecting these comments, no formal 
analysis could have been conducted. Instead, we analyzed these comments in team debrief meetings and 
reflections. Given participants’ positive feedback regarding Hugo (via anecdotal evidence), we anticipate 
that the design of Hugo the consent bot could serve as a model of the informed consent process for research 
in SL and other VWs. 


4.3 Automated data collection via Heads-Up Display (HUD) 


Our ‘Heads-Up Display’ (HUD) was a unique feature adapted for the Future InfoExpo to guide and assist 
each participant through the exhibits and survey instruments. A HUD is an additional visual element on 
the screen (appearing above the avatar) that stays with each participant and is only visible to that 
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participant. Our HUD Orientation provides an orientation and introduction to our HUD to familiarize every 
participant, regardless of experience, with its functions and use in the exhibits. Our HUD tracks progress 
through each of the exhibits via shading cues, provides exhibit-specific information, enables a help chat 
system connecting participants with researchers regardless of location within SL, provides teleportation 
back to the Future InfoExpo entrance, and determines eligibility for a gift upon completion of exhibits. 

Importantly, our HUD administered our survey instruments to participants. It presented an exhibit- 
specific survey to participants upon their completion of that exhibit. Participants used our HUD to answer 
the survey questions, including providing written responses and scale responses. Out of 132 participants, 95 
(72%) specifically commented on the HUD via one open-ended, text-response question on the Orientation 
survey; these responses were formally analyzed using content analysis. Most participants experienced a good 
interaction with the HUD; comments included: 


e “Tt was helpful to be offered the URL for the PDF outlinging the actual Study.” 

e “ .. the clarity of the graphics and interactivity helped keep my focus and made perfect sense.” 

e “Tt is a lovely build, very easy to navigate because it is set up in a logical flow.” 

e “This is fantastic. the CVL has an orientation as well as the VAI and this is as good if not better. 
I think it will help a lot. easy to understand.” 

e “Useful setup especially for new residents. Simple and not overwhelming” 


While the HUD concept has been used in other SL projects (e.g., Holloway 2013), our HUD represents a 
unique system for tracking participant progress and for delivering and organizing information that as yet 


has no counterpart outside of the VW environment. 


5 Discussion 


5.1 Challenges and Weaknesses 


During the processes of designing and implementing, we encountered challenges general to the project, but 
also specific to each tool, Hugo the consent bot and the ‘Heads-Up Display’ (HUD). Designing for an optimal 
user experience, especially regarding Hugo, presented us our first challenge. This challenge was three-part: 
first, we needed to ensure the logistics of informed consent were executed thoroughly and sufficiently to our 
institution’s review board requirements; second, we needed a method of doing so that would scale to support 
numerous users simultaneously; third, we wanted this process to be open and user-friendly for participants. 
Technological innovations, designed in cooperation with our developers, 2b3dStudios helped us resolve the 
first two parts. Although we tried to imagine and test many different scenarios of how users might interact 
or respond to Hugo, a small number of participants were still confused by and/or uncertain about 
notifications provided by these tools or their next steps after interacting with these tools. Due to the 
inevitability of these situations, at least one researcher was present and identified as a research team member 
to help answer questions and guide participants. 

The HUD design process mandated a twofold approach: the participants used the HUD to help 
track their progress, provide information about the exhibits, etc.; we, however, used the HUD to capture 
survey input data. Given this double duty of the HUD, we faced concerns about the HUD’s user-friendliness; 
to address these concerns in part, we also developed a HUD Training module, required for participation. 
Moreover, the HUD required visual appeal and clarity to avoid overwhelming participants. On our end, the 
HUD design process presented significant technical difficulties, largely related to exporting the data out of 
Second Life and getting it into a human-readable format suitable for analysis. Additionally, we tested many 
different types of survey questions and data formats. After much iteration, we figured out a way to email 
each participant’s formatted responses to a dummy email account. These responses, however, still required 
cleaning and additional formatting. 
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5.2 Opportunities and Strengths 


We needed to innovate these tools because almost none existed ready-made. The dearth of available tools 
reflects the newness of online research, especially virtual world research. Informed consent can be daunting 
for both researchers and participants, but most participants responded so well to Hugo the consent bot and 
to our HUD that we believe both tools could be models for other informed consent procedures. For example, 
in online research, scale can be a troublesome issue, but both Hugo and our HUD deftly handled multiple 
simultaneous participants, while providing a consistent experience for all. For example, ensuring 
participants have relevant study information can be challenging, but by serving as participants’ first point 
of interaction, Hugo also prepared participants for our HUD’s automated information delivery mechanisms. 

Looking beyond our own study, Hugo the consent bot could be modified in various ways. For 
example, Hugo could be used to screen study participants, based on their avatar data, or to offer participants 
different or specific levels or types of participation, rather than just a blanket level or type for all 
participants. Our HUD could be modified visually to reflect the context of another study, and we would be 
especially interested in additional ideas for data collection and for resolving the difficulty of exporting the 
data out of Second Life and into other clean formats. 


6 Conclusion 


This paper presents the unique opportunities and challenges afforded by conducting information behavior 
research in virtual worlds. This paper also presents the rationale and design of the automated informed 
consent bot, ‘Hugo’, and the automated data collection device, the HUD. 

Phase 2 of our project sought to extend our understandings of information problem-solving in virtual 
environments and how general users, as well as information providers, can leverage immersive, 3D 
capabilities for effective and efficient information seeking, use, organization, and sharing. Information and 
communication systems are becoming more immersive and pervasive. Concurrently, new affordances and 
uses are possible, which users and information providers will need to learn and adopt to become efficient 
and effective. 

Whether the promise of virtual environments is realized remains to be seen. However, the 
exploration of information behavior in virtual environments is useful in clarifying a vision of the future in 
the digital age. It remains to be seen how the affordances of these environments may change information 
behavior. When the moving picture camera was first invented, it was used to film stage productions from 
the back of the theater—the new and exhilarating ways to tell a story with a film camera had not at first 
been imagined. This project seeks to imagine some of the possibilities of new and exhilarating ways to learn, 
communicate, and solve problems in the digital age. 
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needed to explain the process of (non-)adoption and (non-)use of digital initiatives within the sector. 
This research seeks to unfold the complexity of the shaping of ICT through a case of swimming sector 
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investigating the dynamics of ICT practices, my study seeks to understand the arrangement of such 
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The high turnover of volunteers, the cruciality of volunteering time offered into organisation, the 
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1 Introduction 


Voluntary Sector Organisations (VSOs) play a critical role in society (Kendall, 2003). To achieve their 
missions, VSOs, like other business firms and governmental agencies, are also welcoming digital technologies 
to respond to the sector challenges such as competition for funding and volunteers (Burt & Taylor, 2000). 
Hence, ICT has been proliferated in the sector and there is a growing demand on studying ICTs in VSOs 
(Pereira & Cullen, 2009). 


1.1 Literature Review 


Two major research streams provided insights to this topic are firstly those scholars who have investigated 
what can be done ‘before’ an IT project to increase adoption and use (e.g. Hackler & Saxton, 2007) and 
secondly studies with a focus on what happens ‘after’ an IT initiative in terms of organisational consequences 
(e.g. Burt & Taylor, 2003). These studies, however, suffer from a methodological and a conceptual 
inadequacy. The lack of multi-level analysis limits our understanding about dynamics and possible mutual 
effects of different actors among VSOs (c.f. Iverson & Burkart, 2007; Malina & Ball, 2005). Additionally, 
such studies with their deterministic approach toward ICTs have not well conceptualised the duality of 
technology and organisation (e.g. Hart, 2002; Zhang & Gutierrez, 2007). 


1.2 Research Pathway 

To generate better insights about the field, this research aims to explore ICT-related practices within the 
specific context of VSOs through an interdisciplinary research. To open the black-box of how ICTs are 
being practiced, this research narrows its focus on exploring the case of small scale VSOs which operates in 
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the context of swimming industry. As will be discussed later, in the earlier stages of the ethnographic 
fieldwork, an issue emerged: many participants of the case study were referring to other people or contexts 
beyond the case when they were being interviewed about their day-to-day IT-related work. The problems 
of user-focused single-setting case studies have been flagged by other researches. Lamb and Kling (2003) 
argue that atomistic views of technology users are problematic and alternatively they introduce the notion 
of Social Actor to highlight the role of other external and internal factors in conceptualising the notion of 
user. To study technology users in VSOs as social actors, we need to cross the boundaries of the case and 
hence it seems that the previous ‘flat ethnographic’ case study (Williams & Pollock, 2009) should be 
completed with other techniques to generate data and make sense of more distributed contexts ((Monteiro, 
Pollock, Hanseth, & Williams, 2012). 


2 Method 


This project uses Scottish Swimming Sector as a case study. This sector is mainly based on the work of 
volunteers; but there is of course some ‘commercial stuff’ there like ‘paid coaches’. While the sector benefits 
from the ‘passionate and self-motivated’ volunteers, there are some challenges with regard to the continuity 
of their ‘inputs and commitment’. This project is based on qualitative interpretive research (Walsham 2006), 
used here to explore everyday ICT practices and associated social actions (Silverman 1998; Suchman 2007) 
within a voluntary-basis swimming context. In doing so, initially, an in-depth quasi-ethnographic case study 
of a leading swimming club was designed to shed light on the use of ICTs by volunteers, in particular by 
applying the insights generated by Social Informatics studies and IS Research. This is being followed by 
looking at ‘extended’ settings and actors like a new sports software vendor; since its arrival to market, some 
people in the club were under-exploiting the functionalities of the existing software with a hope to the 
procurement of that new system. 

By moving between the club and other distributed settings and actors which they are being emerged 
through snowballing techniques, this study seeks to “grasp the mutual relationships between the local real- 
time accomplishment of practices and the textures that they form and in which they are implicated” 
(Nicolini 2010, p. 1412). 


2.1 Field Setting 


As for the first stage of the study, a leading swimming-club which operates on a non-profit basis has been 

examined to analyse ‘ICT practices within smaller voluntary organisations’. The club is managed by a 
Management Committee and several professional coaching-staff. Also, its operations are widely supported 
by various volunteering-resources, mostly swimmers' parents. Volunteers may take some occasional-jobs 
such as timekeeping or marshalling-swimmers or they can get involved in more formal positions like pool- 
hiring and fund-raising activities. Those volunteer in more long-term jobs have usually some kind of direct- 
and-constant communication with the staff. 

As for the second stage of the study, by following the club’s members’ pointers and by using 
theoretical insights from the Social Actor Model, two other settings were selected which their operations 
have sensible effect on the club’s activities. These settings are i) a ‘Scottish Swimming’ as a national 
governing body which itself use an online platform to provide a space for integrating the competition results, 
ii) the ‘Market of Swimming Software’ which serve ICT-enabled solutions for both administrative and team- 
management needs. The introduction of the newer systems and the dynamics with the market are both 
associated with a number of challenges for transforming the practices institutionalised around a specific 
technology. 
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2.2 Data Collection 


Walsham (1995) has made a simple, but helpful distinction between an “outsider researcher” and an 
“involved researcher”. For the first stage of my research, I have been acting more as an involved researcher 
to make better sense of everyday practices within the adopted case i.e. the swimming club, in particular 
through ethnographic techniques, for example ‘shadowing’ (Czarniawska 2007) the former Head Coach who 
is already a honorary member of the ‘Board of Directors’. The Figurel shows his home-office when he and 
her wife entering data into the club’s software. 


Figure 1: Former head coach working with the club’s system 


The ‘involved study’ helped me to dive into issues and challenges surrounding the adoption and use of 
digital technologies by the paid staff and volunteers. However, for the second part of the research, my role 
has tended to be as an ‘outsider researcher’ and hence to gather the rich and relevant data I have been 
relying more on archival data such as website comparisons and online forum analysis as well as formal 
interviewing technique to engage individuals directly in a conversation and hence to generate deeper 
contextual understanding from the participants' social worlds (Schultze and Avital 2011). 


2.3 Data Analysis 


Data was collected using a variety of tools and techniques including the recording of interviews and filming 
of practices. This provided a rich appreciation of ICT-related practices within-and-around the swimming 
club. To make sense of the generated data, data analysis includes the coding of interviews, writing up the 
theorised stories from field-notes (Golden-Biddle and Locke 2007), and ‘to-ing-and-fro-ing’ between 
generated data and emerged categories to identify key themes. However, as the research is in-progress, the 
process of data collection and analysis, especially for the second stage, is still not ended. 


3 Developing Results and Remarks 


Within about nine months of quasi-ethnographic fieldwork at the club setting/level followed by one-month 
study around the club, a couple of issues have been emerged. 

The club is planning “IT Refreshment” as unlike the club's impressive successful in delivering 
swimming training services, it has experienced some difficulties to use ICT to minimise time-consuming 
manual works and thus human-errors. This was seemed similar story within many SMEs (e.g. Ritchie and 
Brindley 2005), however, what makes this case quite different is that its operation is widely associated with 
‘coming-and-going' volunteers. Therefore, at the early stages of the fieldwork, it has been confirmed there 
is no or less clear standard and strategy for current and future IT-related jobs. In an interview with the 
coach, the coach commented: 
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“I have a system I use for my documents ... Liza [a pool-hiring volunteer] has her own system which 
she uses, ... but we don't check to see if we have the same ... and Kayle has no system [laughing]” 


The Committee's vision to expand the club's operational area has brought to their attention that a 
mismatch between the club's activities and IT-supported solutions could bring their future success into the 
question. A former member of the Committee noted: 


“ .. the current club's IT system was designed some years ago ... we go to improve the club's 
functional system ... things should go more on-line” 


However, it was noted that the digital technologies were used in different locations such as ‘beside-the- 
pool’, ‘volunteers’ home-offices’ and ‘the club’s less-used office’. 


3.1 Time as Power 


The generated data also suggests that since the club’s volunteers are different, those who have more 
time to offer may also have more power to shape the general pattern of ICTs use. In particular, if a key 
actor decides not to use a collaborative technology, that technological system would probably fail. 
Volunteers who works with the club’s staff are more likely to follow the established working practices 
while those are in touch with other volunteers or parents may change the current practices by either using 
new type of ‘every technologies’ such as Doodle or Dropbox or an innovative use of existing platforms. This 
finding in particular is inconsistent with idea that says “[{flormal volunteering is typically carried out in the 
context of organizations; informal volunteering (which in this context means helping friends, neighbours, 
and kin living outside the household) is more private and is not organized” (Wilson and Musick 1997, p. 
700). Here, it argued even with so-called ‘formal volunteering’ the type of the volunteer work is a mediator 
to organising practices associated with that given work. 


3.2 Disposable Projects 


Another issue is related to the challenges surrounding the high turnover of volunteers in smaller scale 
VSOs. This ‘coming-and-going’ has led to many of workarounds as people have more freedom to choose 
from portfolio of available technologies. This prevents the club to have an TT Vision’ and hence many of 
developments within IT infrastructures happen by accident. For instance, the club was offered a website 
design about 10 years ago, but it has not been developed that much till recently. Again, a swimmer’s parent 
offered a re-design for the website a couple of months ago. 


3.3. Intended Non-Use 


The established literature suggests that VSOs suffers from the lack of resources such as lack of skills, 
however, the emerging data flags that there are some politics behind of the ‘issue’ of the skill, in particular 
certain people in this context prefer to not go online as face-to-face communications is a ‘desired’ social 
activity for them. 


3.4 Sources of the Dislocality of Practice 

The fieldwork has also revealed other distributed settings, people and non-humans which have an influence 
on the club’s ICTs practices. This includes a national governing body for swimming (Scottish Swimming) 
which uses an online intermediary to coordinate and regulate all Scottish swimming competitions 
(SwimScotland.co.uk), and two main software vendors, Hy-Tek and TeamUnify. 


3.5 Infrastructural Coordinator 


SwimScotland.co.uk is simple but ‘very functional platform’. It has been archiving the ‘meet results’ since 
1999. It also provides information about past and upcoming meets. When a meet (a competition) gets 
posted on the website, it will also provide a ‘Team Manager’ file of the event. This file is one the Hy-Tek 
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system file format to organise a swimming competition. Clubs need to download the file and fill the file out 
with the information of their swimmers and send that back to the ‘event organiser’. While the earlier idea 
of such online platform was to be a coordinator between clubs in their exchanges of meet data, it has also 


become an infrastructure which sets ‘standards’ of the ‘acceptable’ format; i.e. Hy-Tek. 


3.6 Functionality vs. Time-Bank 


Hy-Tek has been in the market of swimming software since 1989. Many of swimming associations like 
Scottish Swimming have been standardised around Hy-Tek system. However, there is growing software 
which provides real-time data-entering as well as ‘Website Management’ facilities. Because of its improved 
functionality, many clubs, including the one which I am studying, are facing a disagreement among their 
staff members to whether switch to TeamUnify or keep using the Hy-Tek system. 

Moreover, with a hope to the procurement of TeamUnify, some of the club staff do not ‘waste their time’ 
to work with a system which ‘seems like the platform is about 15 years old’. TeamUnify has made a feature 
which import/export Hy-Tek file, however, it has been claimed that its files ‘are not fully compatible with 
Hy-Tek’. The rivalry between ‘institutionalised’ Hy-Tek and ‘new-born’ TeamUnify has produced different 
interpretations among both volunteers and staff and hence it increases the complexity of the practices 
related to organising a meet or managing a team. There are some clubs which use both systems at the same 
time. Hy-Tek is just for getting into the national/local meets and TeamUnify for managing their everyday 
coaching programmes. Many clubs, including the case, might have not that much problem with the budge- 
side of new software, however, the key barrier is the ‘limited and unplannable’ volunteering time they have 
and the ‘huge time investment needed’ to re-shape all practices that have been fixed over the time. 


3.7 The more Free, the less Useful 


Exiting academic studies and industry research illuminate that financial resources are one of fundamental 
challenges for voluntary organisations. This could come from limited-ness of such resources or 
unpredictability of the money they will have. This challenge therefore makes them more conservative with 
spending their money, especially if the proposed expenditure is not that much urgent and visible. Recently 
and mainly because of advances in information technologies such as the accessibility to Software-as-a-Service 
options or the popularity of web 2.0 applications, there is growing attempts in envisioning a better future 
for voluntary organisations as such technologies are free (or at least unbelievably cheap!) and considerably 
easy-to-ride. The cost-related characteristic of new technologies is a great respond to the sector’s 
functional challenges with scarcity and discontinuity of financial resources. However, the investigated story 
of the adoption and use of some sort of those new technologies shows some degree of contradiction between 
expected and actual results emerged from certain experiences. This ‘unexpected outcome’ comes from the 
lack of ‘minimum standardisation’ required for a working IT infrastructure. IT projects and tools with 
a cheaper price to be taken-up have a tendency to be given up quickly as there might be personal preferences 
and situational decisions. Money could bring some sort of ‘discipline’ and as a result, a greater usefulness. 


3.8 The more Simple, the more Chaotic 


As mentioned in the previous section, another feature of newer technologies is simplified complexity they 
have and hence people can quickly grasp their affordances by a couple of hours of training or even some 
trial and errors. This is also good news for voluntary organisation with limited staff and high turnover of 
volunteers. This, in principle, makes a considerable shift in the sector in terms of the level and extent they 
use IT-based solutions in their everyday problems. Despite this promising idea, there are some conflicting 
results from the observed case. Those results identifies some ‘increased chaotic’ situation made by using 
a wide-ranging of those free technologies, especially because of volunteers’ good level of freedom to just get 
a job done with whatsoever tool they use. While simplicity was a driver for innovate process for 
accomplishing a task by a volunteer, on the other hand, it caused dis-organised circumstances in which the 
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task needs more time and workload to be done. Like the price, higher level of system’s complexity could 
also prevent ‘rushed and less-assessed’ attempts in adopting such technologies. The key argument is that 
the ‘priceless’ and ‘easiness’ of newer technologies do not necessarily result in a better IT-empowered 
situation for voluntary organisations. 


4 Possible Contribution 


This research is expected to contribute in four main domains. Firstly, consistent with the calls for multi- 
locale technology studies (c.f. Koch, 2007; Pollock & Williams, 2010), this research generates insights into 
the role of distributed contexts such as governing bodies in shaping of ICTs practices. Secondly, although 
this research is informed theoretically by the social actor model, it extends the logic-of-the-model through 
the application and modifications needed for the context of small VSOs (c.f. Lamb 2005). Thirdly, given 
the Swimming Sector as a case, the research offers a basis for understanding issues and challenges 
surrounding swimming-oriented (information) infrastructure, in particular the role of Hy-Tek and 
TeamUnify in constructing of such infrastructure (c.f. Monterio et. al. 2012). Finally, the study is expected 
to support policy-makers and practitioners in (re)defining their ICTs plans/programmes. 
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Abstract 

This paper conveys one LIS professor’s experience with teaching eight students in a newly minted 
multicultural/diversity course for an ALA-accredited LIS program. The course was taught 100% online 
with a structure that aimed to incorporate as much reflection and interaction as possible due to the 
humanistic nature of the topic of the course. This open-forum approach to presenting the course was met 
with resistance by students in various ways. This research seeks to explore what it means to be an LIS 
educator while simultaneously learning ways in which challenging student discourse in an online context 
impacts learning and possibly, competent library service in the field. 
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1 Introduction 


It is important to consider the fact that in some LIS programs, courses about cultural diversity and cultural 
competency are just coming to fore in a world that is incredibly multicultural in many, many ways. This 
aspect of multiculturalism is not just socially based but also digitally based with the advent of social media 
platforms, like Facebook and Twitter. Online social platforms have shortened the miles between people all 
over the world in terms of socialization and also with classroom learning environments in many higher 
education programs. In this nuanced, yet deeply contextual environment, working with librarians around 
cultural diversity and competency via an online context (Blackboard) proved to be dichotomous for both 
instructor and students. Taking an ethnographic lens to the teaching of such a course, my experience as 
both a teacher and a learner of ways in which culture, privilege, and various “isms” are discussed and talked 
and written about begs contemplation and consideration. As a woman, a person of color, and someone 
holding a doctorate, based in an urban setting, teaching cultural diversity and competency online to a 
diverse group of European American women was a professional learning experience and story, which must 
be told. Who will listen? Will this single story matter in our LIS/IS world? If so, how? Why? This research 
seeks to unpack these various questions or at least begin to unpack these complex issues of culture, gender, 
identity, power, teaching and learning, that are vitally important to the onward progression of LIS education 
and service in this diverse, 21* century world. 


1.1 Methodology 


This research was conducted as a natural auto-ethnographic exploration teaching a new course to LIS 
students on the topic of cultural diversity, multiculturalism, and cultural competency. The methodology 
included ethnographic notes, teasing out patterns of reflection and response from student work, considering 


student course evaluations, and inquiries into my own lens turned back upon myself to document and 
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modify my pedagogical styling and approach. The course was structured within a 10-week quarter term 
with an additional finals week. The course was set up to evoke reflection and deep considerations for student 
identity constructs and the ways in which those constructs influenced their perceptions of library materials 
(books, ebooks, audiobooks, databases, and web resources) and library patrons. Eight cultural groups were 
studied in the course: Native Americans/Pacific Islanders, Latino/Hispanic Americans, African 
Americans/African Diaspora, Rural/Urban, Asian American/Diaspora, the Underserved, Gender (as in 
male, female, boys, girls) and the LGBTQ community. 

The course was primarily an immersive reading project to expose students to a variety of texts 
across reading levels of children’s/juvenile literature, young adult literature, and adult literature. Part of 
the reading project included prescribed titles to be read by the entire class, plus a series of “open pick” 
choices chosen by the students. The purpose of the reading program was to immerse students into the 
literature of multiple cultures to learn more about the life experiences that may mirror the experiences of 
library patrons and professional colleagues. The immersive reading project was submitted in three stages 
during the term, as one-page book reviews in a prescribed format set forth by the instructor. 

Students juxtaposed reading of other cultures with exploring their own diverse identity constructs 
by writing a self-reflective narrative essay in two parts: at the beginning of the term to identify their diverse 
identities, and at the end of the term to reflect on their journey through the course, the good, the bad, and 
the ugly. Weekly discussion forums included interactive tasks that encouraged student conversation about 
topics such as racism, white privilege, the digital divide, on-the-floor library service, selecting library 
resources for diverse communities, and immersing in international librarianship. Weekly tasks usually 
required a one-page response to lecture notes, scholarly readings, web treks, assigned videos (TedTalks, 
BBC, for example) and/or guest lecturers. 


1.1.1 The ethnographic observer 


Ethnographically observing student work included reading and taking notes on patterns of emergent 
responses to assigned tasks. This meant that instructor participation in the discussion forums were minimal 
for the purpose of learning how the course structure and requirements were working (or not) for student 
learning outcomes. 


1.1.2 Practitioner Inquiry 

I engaged in my own practitioner inquiry to reflectively examine my teaching of the course for the purpose 
of learning what readings, tasks, and assignments worked for students and what did not work. The 
methodology of practitioner Inquiry (Cochran-Smith & Lytle, 2009) also allowed me to carry a heightened 
sensitivity towards students’ needs for engaging in the course material and topics that might be emotionally 
uncomfortable for them in various and unexpected ways. As instructor, I kept notes and wrote to my mentor 
as a checks-and-balance approach to engaging in the course as a teacher but also as a learner. 


1.2 Conceptual Framework 


The conceptual framework for this research was triangulated around the concepts of LIS critical theory 
(Buschman, 2007), LIS cultural competency (Overall, 2009; Jaeger, et.al., 2011), and online learning theory 
(Hughes, 2004). Critical theory in LIS informs the instructor’s pedagogical approach to teaching about 
cultural competency in terms of the ways in which power of positionality affects class dynamics for the 
instructor as leader and facilitator but yet (in this research) a person of color, coupled with student 
positionality of European American women as students (learners) in a structured virtual environment, not 
as requisite leaders reflected in the demographics of American librarianship that is consistently 90 percent 
European American women (Jaeger, et. al., 2011; American Library Association, 2012). 

The emerging specialization of cultural competency in LIS looks at how in librarianship, culture 
and knowledge are often seen as separate concepts, whereas in actuality, we glean our knowledge based on 
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our cultural conditionings and practices. In the LIS online classroom, cultural competency is enacted as I 
presented course information as a holistic package that encased “knowledge” (the multicultural reading 
project), LIS research (weekly readings and tasks), and LIS culture (ongoing professional collaboration and 
discourse in discussion forums) which, applied intersectionally, is often a new approach to librarianship for 
most LIS professionals. Jaeger, et. al. (2011) charges LIS educators and professionals to do the hard work 
of reflection and discussion to bring the profession up to par for working with 21* century diverse 
populations. Hughes (2004) encourages great care towards online student learners because the focus for 
online education is not the teaching of it, but the learning of it. In this vein, online instructors are continual 


learners, as well. 


2 Conclusion 


The course was taught for the first time during a summer term with a group of eight Master’s LIS students, 
all female, 7 European American, 1 African American, all professionally working in librarianship. Early 
outcomes indicate that the online learning context was a predictably safe space for learning in terms of 
students being able to write and more fully think about learning responses before conveying those responses 
as a part of class discourse. However, the online environment proved to also be a space that limited full 
teacher-student interaction and mutual understanding that face-to-face experience solidly confers. In the 
online environment of teaching and learning about one’s own cultural identity constructs, perceptions, 
assumptions, and biases, and then interacting with others about those ideas seemed to create a double 
consciousness in students. On the one hand, in-class conversations were cordial, respectful, deeply nuanced 
and thought-provoking in amazing ways. Student writing and thinking as applied to the ongoing reading 
immersion project was a continual improvement for most students as they incorporated instructor feedback 
into cumulative assignment submissions. Final papers and project submissions indicated deep student 
learning, yet student course evaluations indicated dissatisfactions with course structure, instructor 
“expertise” (or not), and course workload. This was a puzzling outcome not only because in-class discourse 
indicated another value of response, but also because most of the students in the course earned the grade 
of “A”, with just one other student earning a “B”, and one student earning a grade of “Incomplete” due to 
a sudden medical emergency. The disconnect between online student-to-student and student-to-teacher 
interactions and the submitted student evaluations leaves room for further practitioner research to tease 
out where the disconnect began and the perceived common ground ended. 
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1 Introduction 


The participatory web came of age during a period of extreme market liberalization ushered in by passage 
of the Telecommunications Act of 1996, which has allowed for the proliferation of an astounding array of 
tools and services designed to facilitate communication and the sharing of data among networked 
individuals. The economic model used by most of these companies is to offer a free product or service to 
end users while selling targeted access to advertisers and third party developers (e.g. “Advertise on the 
Yahoo! Bing Network,” n.d., “Advertise with Promoted Accounts - Twitter for Business,” n.d., 
“Advertising on Facebook,” n.d., “Facebook Platform Policies - Facebook Developers,” n.d., “Google Ads,” 
n.d.). This model incentivizes the commoditization of user data on the part of the platform developers, 
which frequently creates tension and confusion among users as they are confronted with advertising and 
algorithmically mediated content that reveals just how much access to their data web services have. 

Notions of web security are frequently learned through storytelling among friends and acquaintances 
(Rader, Wash and Brooks, 2012). The rise of projects like CryptoParty, a decentralized collective attempting 
to make popular encryption tools more accessible to non-experts, suggests there is a growing interest in 
personal management of privacy and web security, and that there is a significant social component to the 
way these practices are learned. I argue that the growing use of encryption and other third-party privacy 
management tools in the United States are a reaction to the largely unregulated digital spaces that have 
become significant sites of social interaction; an effort on the part of users to instantiate a sphere of 
regulation at the level of the individual. This idea is further explored through an analysis of the early growth 
of CryptoParty, a decentralized collective of cryptography enthusiasts attempting to make encryption 
accessible to people outside the hacker community. 

Grounded in an historical overview of telecommunications policy in the United States as it relates 
to the development of computer-mediated communication, I present a preliminary analysis of CryptoParty 
as a site where work of individual privacy regulation occurs, and is taught to others. 
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2 Regulation in U.S. Telecommunications 


Telecommunications policy in the United States has been shaped almost entirely by two pieces of 20% 
century legislation: the Communications Act of 1934, and the Telecommunications Act of 1996. The former 
was built upon the assumption that it was in the public interest to operate the telephone network as a 
utility with universal access to all. Unlike many European countries, the U.S. did not adopt a model of 
state-owned telecommunications, instead opting for a regulated monopoly, seen as an economic necessity 
(Bauer, 2009). Constructing a physical telephone network to serve all Americans was a huge capital 
investment, and it was seen as an exchange in good faith for the government to allow AT&T to be the sole 
carrier, as long as universal access to phone service was provided (Aufderheide, 1999). 

The sixty years following the passage of the Act of 1934 were a period of significant technological 
and social change, making the need for an update to the original legislation unavoidable by the 1990s. The 
advent of the internet and explosive popularity of the world wide web precipitated the convergence of 
technologies that had been regulated separately under the Act of 1934. The poor fit of the aging regulatory 
structure and the prevailing ideological disposition toward the benefit of competition in a free market system 
resulted in the highly deregulatory Telecommunications Act of 1996. A massive and complex piece of 
legislation, the Act’s defining feature is the forbearance clause, which enacts a policy of not conducting or 
enforcing regulation that would interfere with the public interest!—the underlying assumption here being 
that a competitive marketplace will be of greater benefit to the public than a strictly regulated 
communications industry. 

In the intervening years since 1996, advances in the development of web applications have created 
a complex ecosystem of services that position themselves alternately as platforms for public expression and 
as private businesses, dependent upon which is more convenient for the discussion at hand (Gillespie, 2010). 
As American users have begun to adopt the products of companies like Google and Facebook as methods 
of everyday communication, declining trust and comfort with the way these companies use customer data 
have become commonplace topics of conversation (e.g. Frum, 2012; Oswald, 2012; Paul, 2012). In the next 
section I discuss in more detail the way people are adapting their use of networked communication tools to 
align better with their personal interests. 


3 The Rise of Individual Regulation 


Brunton and Nissenbaum (2011) point out two asymmetries in the power dynamic of collecting user data 
on the web. First, that we are rarely able to choose whether or not to be monitored?, nor do we have control 
over where that information goes or what happens to us because of it. Second, most often we do not know 
the full extent of the monitoring taking place. Nissenbaum (2009) asserts that there is no universal 
description of privacy for all people in all situations; that it is a highly contextual state dependent upon a 
multitude of factors, and that private vs. public is a false dichotomy. Palen and Dourish (2003) emphasize 
the dynamic and multidimensional nature of privacy as experienced by individuals managing presence in 
networked settings, drawing attention to the importance of taking into consideration broader social and 
institutional settings when discussing privacy concerns raised by new technologies. 

Technological ‘fixes’ for privacy abound: numerous researchers and technologists have taken up the 
challenge to design and build technological tools that help people to manage and customize the visibility of 
their digital activity. Examples of such tools include PGP encryption for email (Garfinkel, 1994); Tor, which 
allows for anonymous browsing by routing traffic through multiple nodes in a secure network (Tor Project, 


1 A term of art in the policy sphere, ‘public interest’ has notoriously defied stable definition (Krasnow & Goodman, 1997; Schultze, 
2008), yet has been a central characteristic of U.S. communications policy since the 1927 Radio Act. 

? Brunton and Nissenbaum reject the plausibility of the argument that a user can opt out of using a service if they are uncomfortable 
with the data capture policy (as does Portwood-Stacer, 2012), and I agree. While technically possible, there is often a substantial 
social cost to withdrawing from these services. 
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n.d.); and more recently, TrackMeNot, a browser plugin designed to obfuscate internet activity in order to 
confound tracking cookies (Howe & Nissenbaum, 2009) Despite the proliferation of tools and the increasing 
sophistication of privacy settings built into commonly used apps and services, they have repeatedly been 
shown to be difficult for users to configure and implement as desired (boyd & Hargittai, 2010; Sheng, 
Broderick, Koranda, & Hyland, 2006; Whitten & Tygar, 1999). 

To use Brunton and Nissenbaum’s (2011) terminology, growing awareness of these tools has resulted 
in an increasing amount of vernacular resistance to the economic model of data capture. Methods of 
vernacular resistance entail individuals engaging in small acts of subversion to better align their use of a 
system with their personal tolerance for data collection. In the sections below I discuss first the experience 
of developing data management strategies from an individual perspective, followed by the creation of 
CryptoParty, a global effort to provide people the skills to decide upon and engage in their own forms of 


vernacular resistance. 


3.1 Individual Data Management 


Greg is an American is in his late twenties with a bachelors degree in information technology from a large 
public university. In the fall of 2011 he became involved with the Occupy movement, participating in protest 
activity in a major American city over a span of several months. Given this experience, his perspective on 
the use of encryption is likely more sophisticated than the typical internet user because he has developed 
tactics not only for managing his own digital presence, but was also an active participant in the development 
of information security strategies within the Occupy movement. 

While he has always been interested in computers and computing, Greg was a relative latecomer 
to the participatory web. Until 2010, he identified himself primarily as a lurker; someone who would surf 
the web, observing and consuming information but not creating any content. During this time he was not 
a member of Facebook or other social netowrk sites, and describes the extent of his internet activity as 
“surfing cool websites, looking at interesting videos and just using email.” 

When WikiLeaks gained publicity in 2010 over the release of U.S. State Department diplomatic 
cables, Greg was curious about the technology that made such a thing possible. It was at this point that he 
first learned about Tor and PGP encryption, which are used to anonymize traffic within a network and 
scramble the content of messages, respectively. Motivated politically and by his interest in the technology, 
Greg became involved in the community of hackers developing, using, and advocating for encryption. With 
the knowledge gained through this involvement and the tools at his disposal to have more control over his 
web presence, Greg became much more active online. Today he uses Facebook, Twitter, and maintains a 
blog. He uses some form of encryption (or where not possible/plausible, obfuscation) with each of these 
tools. 

He clearly differentiates between security and anonymity, and the realistic possibilities for each 
when using computer versus a mobile device. Because he uses Twitter on his smartphone, which carries a 
regulatory legacy that his laptop does not, his service provider and the government are able to connect that 
activity to his legal identity. 


Because I use twitter on my Android device, [the government] knows exactly who I am, especially 
because I’m involved in [Occupy]. ..and in a way, this is kind of a learning experience for me 
because...if I really wanted to maintain some anonymity, I would go about it in a completely 
different way, which would involve a complete new set of practices. 


He has decided that this possible disclosure of his legal identity is an acceptable level of risk, and proactively 


manages this potential risk by self-regulating what he posts to Twitter. Anything considered especially 
sensitive is sent through other, usually encrypted, channels. It is through this active, conscious process of 


managing multiple data streams that Greg is able to fill in regulatory gaps (in his perception and 
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expectation) in order to make his digital activity visible to only the audience he intends. In the absence of 


policy defining a boundary, he is creating and policing his own. 


3.2 CryptoParty 


Boullier acknowledges the common perception among developers and technologists that users are generally 
unwilling to adopt new practices, but also asserts that “users are in fact ready to go to remarkable lengths 
to adapt their ways of doing things, providing they are..given clear, decisive and reliable instructions” 
(2001, cited in Boullier, Jollivet, & Audren, 2007 p. 1278). CryptoParty is one example of a community- 
based effort to provide such instruction for people interested in learning how to use basic encryption tools. 

On its wiki, CryptoParty defines itself as “interested parties with computers, devices, and the desire 
to learn to use the most basic crypto programs and the fundamental concepts of their operation! 
CryptoParties are free to attend, public, and are commercially and politically non-aligned.” Gatherings are 
organized ad hoc by volunteers in each city; as of this writing, there have been more than 100 gatherings 
on five continents since the first was held in Australia in August 2012 (“CryptoParty,” n.d.). It is 
noteworthy that one of the first things mentioned in a statement about how to contribute is the request to 
“use language and methods an absolute newbie can understand” (“CryptoParty Handbook,” n.d.). This 
openness to non-experts is not something that geek and hacker communities are known for (Coleman, 2012), 
and could be indicative of a larger cultural shift in these groups as privacy and surveillance become more 
quotidian concerns. 

To date, this research has focused primarily on CryptoParty as an organization. The next phase of 
this project will include fieldwork at CryptoParty gatherings to better understand the motivations and 
experiences of non-experts as they learn to take a more active role in their personal data management. 


4 Conclusion 


In this paper, I have traced the history of policy decisions that have shaped the current market-driven 
landscape of communication in digital spaces, and described how technical solutions designed to address 
privacy concerns in these spaces have been largely unsuccessful in attracting users. CryptoParty has been 
presented as a case where people with knowledge of current privacy management tools are reaching out to 
help those with less expertise. This outreach expands not only the reach of these tools, but awareness of 
the options available for privacy management among the general public. 

I argue that individual management of digital exposure and concealment can be conceptualized as 
an instantiation of micro-regulation in the absence of regulation at the national (or international) level. 
This paper does not address whether regulation at a higher level would negate the perceived need for third 
party privacy management, nor whether similar activity is happening outside the United States, but these 
are important questions for consideration in future work. This paper does illustrate strategies by which 
individual users are beginning to take control of their personal data management, and how CryptoParty 
has developed as a way to overcome longstanding usability hurdles for third party privacy management 
tools, particularly encryption. 

This is a preliminary description of the actions taken by individuals in response to discomfort or 
disagreement with the practices of data capture performed by most major platforms of communication on 
the internet. My intent is to open discussion about privacy to interests and agents outside the usual suspects 
of governments and corporations; to investigate how privacy and security are negotiated at a human scale. 
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Abstract 

In research projects, data collection and dissemination are considered as two discrete and independent 
activities. The focus is on the research question, and not on how to best collect, present and subsequently 
share data. Although most US funding agencies now require that researchers data share, the tools 
available to operationalize this requirement are lacking. We propose show how the open source MediaWiki 
system can provides a lightweight, collaborative, and inexpensive tool to support new data sharing 
practices. This note serves to illustrate how interactive data collection and dissemination supported by 
a Wiki server can be used by scientists both during the project and for subsequent dissemination. 
Keywords: data sharing, information reuse, collaborative information systems, scholarly communications 

Citation: Evans, C. S. (2014). Wiki as a Platform - Turning Dissemination into Collaboration. In iConference 2014 Proceedings 
(p. 820-826). doi:10.9776/14404 

Copyright: Copyright is held by the author. 

Acknowledgements: Project completed as part of Preserving Virtual Worlds 2 (PVW2). PVW2 was made possible through 
National Leadership Grant #LG-06-10-0160 from the Institute of Museum & Library Services. Thanks to Dr’s Jerome McDonough 


(PVW2), Michael Twidale (Academic Advisor), and Catherine Blake for academic feedback, and editorial advice. 
Contact: mailto:csevans2@illinois.edu 


1 Introduction 


Research does not get performed in a bubble. At some point, the findings and results of the research need 
to be published and made accessible. This is a basic tenet of the ‘paradigms’ or ‘normal science’ as defined 
by Kuhn (Kuhn, 2012). The traditional mechanism to achieve this goal is a published scientific article. As 
a distribution mechanism the journal, conference proceeding, and book have been the primary means of 
distributing knowledge. However, changes in scholarly practices such as FORCE11! (Bourne et al., 2012) 
which advocates new ways to work on an article, and nanopublishing (Sofronijević & Pavlović, 2013) are 
changing the way that scientists disseminate information after a project is complete. 

In this paper we introduce a way to leverage the same platform used during the collaborative 
research process to concurrently create a collection suitable for external publishing. 


A common requirement for grant funding agencies is that a ‘ 


‘proposal budget may request funds 
for the costs of documenting, preparing, publishing or otherwise making available to others the findings and 
products of the work conducted under the grant.” (National_Science_Foundation, 2013). For many projects 
this is a straightforward process, and can utilize an institutions online presence such as the IDEALS? 
centralized storage system, publish through a personal, project, or departmental webpage, or disseminate 
through journals and conference proceedings. However for some projects, a more interactive means for 
distributing data is required and that often takes the form of a Wiki. 

While this is appropriate for disseminating results, it doesn’t capture the growing need for 
interactive collection, analysis, and distribution of information during a project, and after a project it is 
complete. 


1 www.forcell.org - “a community of scholars, librarians, archivists, publishers and research funders that has arisen organically to help 
facilitate the change toward improved knowledge creation and sharing. Individually and collectively, we aim to bring about a change 
in modern scholarly communications through the effective use of information technology.” 

? IDEALS - https://ideals.illinois.edu 
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Collaborative research is not new. Harrison and Dourish (Harrison & Dourish, 1996) make a distinction 
between space (the physical location where work occurs) and place (where work is done). Traditionally 
scientific collaboration required collocation (space), and by using well-defined or ad-hoc tools (place). 
Twidale and Nichols (Twidale & Nichols, 1996) showed that while collaborative systems exist that allow 
people to access common information sources, the designers of these systems do everything they can to 
make it feel like the user is using an individual resource. Scientists do work well together, but as reported 
by Blake (Blake & Pratt, 2006a) the process of collaborating would be enhanced with tools that allowed 
better sharing of information. Studies in Collaborative Information Seeking (Karunakaran & Reddy, 2012) 
as well as Blake’s Collaborative Information Synthesis study have shown that there is a focus on sharing of 
documents and the integration of extracted facts, but that is all. It is proposed that collaboration should 
extend beyond documents, extracted data, and annotations to data collection and other activities that allow 
users to work together at different workplaces. 

In the current research model, funding is predominantly for a fixed period of time, and while data 
management plans are increasingly asking for how data and research will be distributed and archived post- 
project, this is not the focus area of a project proposal and not an area where limited resources are typically 
allocated. 

Adding complexity is the collaborative nature of projects, involving research from multiple public 
and private institutions, and the individual involvement of non-affiliated members of the general public. 
People from outside the Principle Investigators institution may not only be authoring or commenting on 
research in progress, but might be actively contributing edits, data, or annotations to the research. While 
ad-hoc tools are often brought together to facilitate collaboration, it is often as an after thought, and will 
often be a mix of online or cloud based solutions such as a Wiki or Google docs. The Wiki is used to share 


information in a consumptive manner, while the Google docs are used to collaboratively edit. 


1.1 Motivating Example 


The Preserving Virtual Worlds 2 Project? (McDonough et al., 2010) was a funded research initiative with 
a defined timeline. The outcomes from the project included a survey tool to continue gathering information, 
as well as an ability to disseminate information. The general structure of the data collection is along the 
lines of a survey. The obvious choice is to use a survey engine such as Survey Monkey ... but who collects 
the data for analysis in the future? Ongoing maintenance of a system is not uncommon after a projects 
funding is complete. So how can this be achieved? 

While the PVW2 project motivated this project, the Wiki as a platform concept was a very small 
part of the research and served the purpose of survey data collection only. While building the system, the 
author realized the potential for other research projects and has started to explore the use of Wikis as a 
collaborative dissemination mechanism. 


2 Description 


The use of Wikis in research is commonplace, and allows authors to publish information while allowing 
users to search and browse information and allows collaborative editing of text documents, as well as a 
modicum of discussion. However, this is largely a publication model, with little interaction with the data. 
We needed a solution that was simple, lightweight, had low ongoing maintenance overhead, was 
secure, and could be sustained without the need for programming resources. 
To satisfy these goals, along with the need for post project longevity, we turned to open source 
software that could be customized with minimal effort. 


3 PVW2 focuses on determining significant properties for a variety of educational games and game franchises in order to provide a set 
of best practices for preserving the materials through virtualization technologies and migration, as well as provide an analysis of how 
the preservation process is documented. 
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The platform we based our solution on was MediaWiki’ — the same Wiki engine used by Wikipedia. 
This platform was chosen because it presented a number of immediate benefits. 


e Lightweight 
MediaWiki uses a lightweight LAMP (Linux, Apache, MySQL, PHP) style application stack. For 
our purposes, we used Apple’s OS X operating system, with its internal web server (which is Apache 
based), a MySQL database, and PHP at the database interface language. Internal pages were 
developed using the standard Wiki markup language and HTML/CSS with JavaScript for the 
enhanced data entry pages. 

e Low Maintenance Overhead 
The MediaWiki server is the basis of Wikipedia, and as such has been demonstrated to scale 
exceptionally well. As the underlying Wikipedia server, the software is actively supported and 
regularly patched. This has the advantage of reducing the burden of requiring a software engineer 
to maintain and upgrade code. Content is displayed using a combination of standard Wiki markup 
language as well as HTML/CSS and JavaScript via common plugins. 

e Secure 
The inherent security model of MediaWiki was another attractive feature in that a dedicated 
systems administrator was not required for the server, and security was largely self managed by the 
system. Dedicated logins to internal resources were not required, with nominated users being able 
to grant/revoke user privileges. To enhance use in an academic setting, the user signup page can be 
readily customized to include wording for IRB acceptance as well. 

e Simplicity and familiarity 
With those major functions addressed, the fourth criterion to be addressed was simplicity. The 
advantages of this approach were twofold: 
1. The navigation model was familiar to the user community; 
2. The Wiki markup schema is well established and well documented. 
Previous solutions were over engineered and added a level of complexity to the task of data collection 
that were inappropriate for the task and the target audience. 

e = ©Extendibility 
While not a standard feature, a readily available plugin for MediaWiki is an ability to use HTML 
and CSS within the pages. This allowed us to create flexible and dynamic survey forms that would 
normally require a webserver to host, and access to the file structure to maintain. By embedding 
HTML into MediaWiki, we are now able to update survey pages without the need to have backend 
server access. 
To create an interactive server that was more than a publishing platform we also installed the 
following extensions: 
O Cite and SpecialCite — enhanced citation handling 
O SecureHTML - allows embedded HTML/CSS in wiki page 
o Vector — Adds the familiar Wikipedia UI 
o WikiEditor - Enhanced word processor style page editing 

e Cheap 
While listed last, with limited resources in a budget, this is a major advantage. A Wiki based system 
is generally open source” The system will run on low cost servers utilizing a LAMP (Linux, Apache, 
MySQL, PHP) approach that is well understood (no specialized training for staff to install, 
maintain, and use), has readily available support (online forums from a very active user community), 


1 www.mediawiki.org 
5 Commercial Wiki systems are available, however a majority of them are open source and free for academic use. 
y: E] 
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and has regular maintenance and upgrade release (minimizing developer overhead, and increasing 
security through user contributed patches). This adds up to being a small line item in a budget 


instead of a major undertaking. 


Standard Wiki Navigation and Layout. 
Look and feel of web page controlled by user 
selected Wiki Template. Allows for use by 
screen readers with appropriate template. 


Security Controlled via Wiki Security model. 


This page shows someone with Admin 
access, allowing them to make changes to 
the code on the page is necessary 


Page Discussion 


Game Player Questionnaire 


What is the game's title? 


Main page 
Community portal 
Current events 
Recent changes 
Random page 
Help 


What genre would you assign to the game? 


Why choose to play this game over others? 


+ Toolbox 
What links here 
Related changes 
Upload file 
Special pages 
Printable version 
Permanent link 
Cite this page 


is the game is part of a series/franchise? 


What if anything gives continuity? 


What is the core or heart of this game? 


Have you ever played a mod for this game? 


Please name or describe the mod/mods you have 
played for this game 


Have you ever created/contributed a mod for this 
game? 


Have you ever looked for a mod for this game but 
were unable to find it? 


Are there some mods that are less acceptable than 
others? 


Use of Wiki Categories to enable 
enhanced content settings for security. 


Category: Survey 


sevans2 My talk My preferences My watchiist My contributions Log out 


Read Edit Viewhistory * 


Completion/suggestion 
for data entry based on 
information already in 
database to minimize 
data entry errors. 


Little Big Planet 


Multiplayer Scrolling Platform Pu 


Replay ability, customization, cooperative 


Game mechanics a 


Dynamic content controlled via 
standard JavaScript and 
CSS/HTML settings. Text entry 
blocks only appear as required. 


Puzzle solving 


User community contributed online content 


HTML based buttons, saves data 
back to the Wikipedia database, 
allowing ready integration into 
existing Wiki pages for reporting. 
Also allows for error checking, 
missed fields, and data entry 
validation 


Clear/Cancel 


Figure 1: Example data collection form using HTML, CSS, and JavaScript 
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3. Discussion 


The traditional use of a Wiki is that of a publishing platform that allows the content to be collaboratively 
edited and comments made, with an audit trail of changes. For many projects this provides an adequate 
level of control, and provides a means to consolidate changes within a collection of documents. 

The HTML and CSS plugins for Wiki are intended for enhancing layout using standard web based 
techniques, but HTML is not purely for display. Adding in these plugins opens up options that are not 
generally considered for a Wiki based platform. 

Using forms and server side scripting such as PHP, it is possible to create interactive websites. With an 
underlying MySQL database, forms can be written to the same database and tables that the Wiki is pulling 
from to display information. A further advantage is that data from the survey can write to the MediaWiki 
database, can dynamically pull information from the database (Wiki pages), and can use the MediaWiki 
interface to display data. By embedding these within a Wiki framework we now have a platform that is 
suitable for both the publishing and collection of data, and displaying dynamic content. 

This combination of plugin and wiki server has a number of other advantages that can also suddenly be 
leveraged. 


e Dynamic Data Displays 

The web has become interactive. Data is now dynamically graphed, charted, plotted and 

manipulated through web based interfaces using a number of techniques such as the Data Driven 

Documents library (d3.org). These interfaces can be readily incorporated into a Wiki page. This 

has the advantage that the data presented in a wiki is no longer static, but is a living entity that 

can be visualized by the user in ways that were not originally conceived. 
e External data sources 

A Wiki is typically limited to the data that is contained within its database. A web based solution 

does not have those limitations, and can pull data from sources outside the Wiki database. While 

not searchable as part of the Wiki index, this information could be used to enhance the data within 

a wiki page to provide context or supplemental data. 

e Template Based Displays 

The advantage of this is two fold. The wiki is a primarily a publishing platform. Content is displayed 

according to the deployment of templates. This template driven approach has a number of 

advantages. 

1. Mobile Sites: By detecting the browser, different template options can be loaded. This allows 
for the same content to be displayed in formats that are appropriate to both desktop style 
clients as well as mobile clients. With the relevant HTML embedded within the template code 
for a Wiki page, information can be displayed to mobile devices in a form that is more 
appropriate to that platform, while desktop computers can have a richer presentation. 

2. Accessibility: By deploying a template that adheres to accessibility standards, those with 
visual or mobility disabilities will be able to access and participate using the same underlying 
data as others within a project. 

By allowing a user to choose the appropriate template for their usage needs (standard, mobile, 

accessible) the system can be used by a wider audience. 

e Data Verification and Quality Control 

Surprisingly, a standard Wiki installation has little in the way of form controls. By using a security 

models we can limit who can enter data (verified project team members), who can edit data (verified 

project team members), who can comment on the data (users with an account), and who can read 
the data (anyone). Data entered and made visible in this form allows for a peer review and 
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verification, and importantly, because changes are logged, a edit history of changes is also 
maintained. 
It should also be noted that beyond peer review, simpler systems such as using Javascript to perform edit 
checks prior to saving data, or Ajax to look up a field in a database and autocomplete a response (minimizing 
transcription errors) are also available once we incorporate an interactive interface into the Wiki system. 


4 Future Work 


Grudin’s Eight Challenges for Groupware Developers (Grudin, 1994) posited that there was a “disparity in 
work and benefit .. often requir(ing) additional work from individuals who do not perceive a direct benefit 
from the use of the application.” At the moment the people who are doing the work are the scientists, and 
the people who get the reward are scientists in other groups. By repurpose existing software in order to 
support collaboration during the project and subsequent dissemination, those creating the content get the 
benefits during the project, and the results are already in a form suitable for public dissemination. 

While the PVW2 project is completed, the Wiki as a Platform concept has been included in a 
number of upcoming research initiatives. We are hoping that by using the MediaWiki system in combination 
with the described plugins, we will have a platform that we can further tailor with minimal effort to not 
only act as an interactive portal for the collection of data, and the presentation and explorations of data, 
but will also provide a platform to gauge usage patterns, and ascertain which pages are the most important 
to different types of users. 

The projects under review will make use of the Data Driven Documents paradigm, creating 
visualizations that will help scientists to explore the data they are contributing, and will have a template 
driven interface so that they display appropriately on mobiles devices as well as desktop computers without 
the need for a dedicated mobile client. 


5 Conclusion 


Collaboration should mean working together. The location is immaterial provided that the tools and 
resources at hand allow people to contribute to the common project, comment on each others work, and 
share results with both those in their project as well as disseminate the findings to a wider audience. 

The use of the Wiki framework presented also has the advantage of allowing scientists multiple 
views of the same data, and a level of transparency in data entry that could also help to ensure data quality. 
Streamlining the process and integrating the research and publication will make it easier for scientists to 
collaborate during a project and disseminate their work after. 


6 References 


Blake, Catherine, & Pratt, Wanda. (2006a). Collaborative information synthesis I: A model of information 
behaviors of scientists in medicine and public health. Journal of the American Society for 
Information Science and Technology, 513), 1740-1749. 

Blake, Catherine, & Pratt, Wanda. (2006b). Collaborative information synthesis II: Recommendations for 
information systems to support synthesis activities. Journal of the American Society for 
Information Science and Technology, 57(14), 1888-1895. 

Bourne, PE, Clark, T, Dale, R, de Waard, A, Herman, I, Hovy, E, & Shotton, D. (2012). Improving 
Future Research Communication and e-Scholarship. Force 11 Manifesto. 

Grudin, Jonathan. (1994). Groupware and social dynamics: eight challenges for developers. 
Communications of the ACM, 371), 92-105. 

Harrison, Steve, & Dourish, Paul. (1996). Re-place-ing space: the roles of place and space in collaborative 
systems. Paper presented at the Proceedings of the 1996 ACM conference on Computer supported 


825 


iConference 2014 Craig S. Evans 


cooperative work, Boston, Massachusetts, United States. 
http://dl.acm.org/citation.cfm?doid=240080.240193 

Karunakaran, Arvind, & Reddy, Madhu. (2012). The role of narratives in collaborative information 
seeking. Paper presented at the Proceedings of the 17th ACM international conference on 
Supporting group work, Sanibel Island, Florida, USA. 

Kuhn, Thomas S. (2012). The structure of scientific revolutions: University of Chicago press. 

McDonough, Jerome, Olendorf, Robert, Kirschenbaum, Matthew, Kraus, Kari M, Reside, Doug, Donahue, 
Rachel, . . . Rojo, Susan. (2010). Preserving virtual worlds final report. 

National Science Foundation. (2013). Proposal and Award Policies and Procedures Guide: Part I - Grant 
Proposal Guide. (OMB Control Number: 3145-0058). Retrieved from 
http://www.nsf.gov/publications/pub_summ.jsp?ods_key=gpg. 

Sofronijevi¢é, Adam, & Pavlović, Aleksandra. (2013). Applicability of the nano-publication concept for 
fostering Open Access in developing and transition countries. 

Twidale, Michael, & Nichols, David. (1996). Collaborative browsing and visualisation of the search 
process. Paper presented at the Aslib Proceedings. 


7 Table of Figures 
Figure 1: Example data collection form using HTML, CSS, and JavaScript.........eececececececeeeeeeeees 823 


826 


How Databases Learn 


Andrea K. Thomer!? and Michael B. Twidale? 


' Center for Informatics Research in Science and Scholarship, University of Illinois at Urbana-Champaign 
? Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign 


Abstract 

The relational database has been a fixture of the modern research laboratory -- used to catalog and 
organize specimens and petri dishes, as well as to organize and store research data and analyses. Yet, 
though there are numerous textbooks on database design and short-term maintenance, there is still a 
need for deeper exploration of how these artifacts change, grow and are maintained in the long term, and 
how their very structure can affect their users’ work. Findings from a deeper, more extended exploration 
of database use over long periods of time would have implications for not just data curation, preservation 
and management, but also for our understanding of actual, situated information organization practices 
and needs in science: designing for actual practice rather than for unrealistic idealization of these practices 
and needs. We draw inspiration, and our title, from Brand’s highly influential book; “How Buildings 
Learn” (1995). We believe many of the topics Brand discusses regarding buildings’ change and growth 
over time might usefully be applied to certain aspects of databases. This work is a first step towards 
understanding how databases, like buildings, learn. 
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1 Introduction 


For at least the last 40 years, the relational database has been a fixture of the modern research laboratory 
-- used to catalog and organize specimens and petri dishes, as well as to organize and store research data 
and analyses; Manovich goes so far as to call them the “key form of cultural expression” in the computer 
age (1999). Yet, though there are numerous textbooks on database design and short-term maintenance, and 
a fair amount of LIS and CSCW literature exploring people’s use of, and on-going collaboration around, 
databases, there is still a need for deeper exploration of how these artifacts change, grow and are maintained 
in the long term, and how their very structure can affect their users’ work. Findings from this deeper, more 
extended exploration would have implications for not just data curation, preservation and management, 
but also for our understanding of actual, situated information organization practices and needs in science: 


designing for actual practice rather than for unrealistic idealization of these practices and needs. 


1.1 How databases, like buildings, learn 
We draw inspiration, and our title, from Brand’s highly influential book; “How Buildings Learn” (1995). 
We believe many of the topics Brand discusses regarding buildings’ change and growth over time might 
usefully be applied to certain aspects of databases. To illustrate, we list here some issues in the book that 
have promise as provocative analytic lenses to apply to studies of databases: 


e Consideration of the database beyond its initial construction over timescales of years and decades 
— admitting at least the possibility that growth and use might last for centuries 
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e Consideration of gradual evolution of the database over time; growth, accretion of extra parts, slow 
changes in how it is used, accommodations to new technologies 

e Periodic radical repurposing of the database, changing understanding of what it is ‘for’ 

e Shearing: how different aspects of the design may need to change at different rates 

e Consideration of the database as a resource that can be better understood by how people use it, 
not simply as a designed artifact that can be analyzed by looking at it without any people around 

e Acknowledging that people will appropriate the database, often violating the pure design intents of 
its architect 

e Tensions between a carefully thought out architecture and a more vernacular style of initial creation 
and modification 

e Tensions between rigid control of form and use enforcing consistency and reliability versus more 
adaptable and responsive but idiosyncratic evolving use 

e Contrasting a doomed attempt to design the database right in the first place with designing to 
make it easier to modify as circumstances change 


In this paper, we present preliminary steps toward what Schuurman calls a “database ethnography” (2008), 
similar to Geiger and Ribes’ trace ethnography (2011): a detailed examination of changes of database table 
structure and schema over time, via a case study detailing the migration of a relational database from one 
system to another. The database under study is the Universal Chalcidoidea Database -- a long-lived natural 
history database containing nomenclatural data about a large superfamily of wasps. This study could be 
used to inform both database design, database curation, and our understanding of how people 
collaboratively use, alter and maintain information structures to do work. Research questions guiding this 


work include: 


e How do databases in general, and relational databases in particular, shape the work or research 
that is done with them? 

e How do database structures or schemata affect the work that is done with them both at the time 
of their creation and long after? 


e What are the recurrent dilemmas in collaborative database design, use and maintenance? 
And finally, 


e How does a database learn? 


2 Prior Work 


Prior work exploring database use over time falls primarily into two camps: the ethnographic and the 
formal. In the former category, Hine’s 2006 study describing the creation of a mouse genome database, as 
well as Bietz and Lee’s excellent 2009 ethnography of a metagenomics database’s use and development are 
motivating touchstones for this work: many of our research questions were inspired by their work. However, 
we’re interested in exploring database use, and its effects, at a longer time scale than traditional ethnography 
would necessarily allow. We are also interested exploring whether the phenomena these scholars describe 
continue to be found at that longer time scale. For instance, Hine finds that databases do not seem to 
fundamentally change how scientific work is done; we wonder if this continues to be the case for long-lived 
databases: databases that continue to be used two, five, ten and more years after their development. 

On the formal end, MacKenzie’s 2012 exploration of “how databases multiply” also seeks to answer 
research questions similar to ours, but from a mathematical perspective. MacKenzie frames the growing 
need for aggregation in and through databases via an exploration of "multiples" -- the intersection, division 
and making of sets as a form of world making -- relying heavily on Alain Badiou’s “philosophical effort to 
articulate mathematics,” specifically set theory, “as an ontology.” Through his "mathematical orientation," 
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MacKenzie seeks to show how databases "wrest inclusion from belonging" -- or make present social orderings 
through mathematical groupings. Though MacKenzie's insights are compelling, here we are more concerned 
with the day-to-day social interaction with sets: what it is for a non-mathematical human brain to adopt 
and maintain a set theoretic style of "relational thinking" to do work over time? 

Manovich (1999) similarly discusses databases as sets of objects or datums -- a way of representing 


the world as “as a list of items which [the database] refuses to order." Manovich’s characterization of a 
database’s contents is an idealization based on set theory: the database provides unordered access to an 
item or set of items, which are retrieved via a structured query, as opposed to devices like catalog ledgers, 
which organize information according to a chronological narrative. 

We counter, though, that while databases may not impose a specific order, the humans entering 
data into said database leave an archeological ordering or narrative behind -- both explicitly and implicitly. 
Explicit “narratives” include fields that mark the date of a record’s creation, creator and last modification. 
Implicit narratives, on the other hand, are found in subtle changes to the ways that the database is used 
over time: the process of creating an entry may subtly change (new entries may be more or less complete 
than old); data entry conventions may shift (new users may bring with them new shorthand); and new 
fields may be added to existing tables, which would remain empty for older records. Databases such as 
wikis, which preserve their full unaltered histories, contain both explicit and implicit narratives, and 
consequently support commentary on changes in usage and conventions over time. Thus, we argue that the 
idea that database records are atemporal in their ordering is misleading -- as those who actually use the 
database are typically perfectly well aware. All databases contain some trace of their history to a greater 
or lesser extent, and older databases obviously have more history to leave a trace. These implicit and explicit 
narratives make trace ethnography feasible (Geiger & Ribes, 2011). 

We believe (as Baker and Bowker did before us, in a closing note to their 2007 work on information 
ecology) that applying Brand’s considerations of how buildings “learn” to databases will help bridge 
ethnographic and formal approaches, and provide and interesting framework with which examine use over 
time. Brand’s perspective brings some of the long term thinking that is so readily present in museum work, 
preservation work, yet is weirdly absent or only just nascent in CSCW or HCI (Voida, Harmon and Al-Ani 
2011 a notable example of the nascent). Brand’s considerations of change over time, evolution of structure, 
and alteration of existing structures for unanticipated use are all apt for studies of databases. 


3 Case Study: The Universal Chalcidoidea Database 


Our case study focuses on a taxonomic database describing an interesting, but somewhat obscure group of 
organisms: the chalcid wasps. Described as "gem-like inhabitants of the woodlands by most never seen nor 
dreamt of” (Girault, 1925), these parasitic wasps are beautiful, plentiful, but often miniscule, making them 
hard to collect and study. There are an estimated 500,000 chalcids wasps in existence, yet only 22,000 have 
been named and described (Noyes, 2003). The Universal Chalcidoidea Database (UCD) is one taxonomist’s 
efforts to collect and make available all existing literature on these organisms and their nomenclature. 
Like many natural history databases, the now electronic UCD was once made of paper: in this case 
a “taxonomic card catalog” — a database of a group of organisms’ names, namers and name changes — 
“compiled and maintained until about 1969 by the ‘indexing section’ of the Department of Entomology at 
The Natural History Museum, London” (“History of the Database”). In the mid 1970s, Dr John Noyes, a 
specialist in wasps, joined the staff and began augmenting the taxonomic index with an exhaustive 
bibliography of related literature; though exact use metrics have not been published, the database’s website 
describes resulting card catalog and bibliography as being “constantly in use” by researchers on the group. 
In 1991, Noyes began migrating the card catalog to an electronic database; in 1998 this database was made 
available on CD-ROM; and in 2002 it was migrated to the “Taxapad” data management system. In August 
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2011, Noyes began work with the INHS to integrate the UCD with a larger system of taxonomic databases: 
Species File Software, maintained by the Illinois Natural History Survey (INHS). 


3.1 Methods and goals of migration (or, how we learned the database) 


The UCD database was given to us in a number of formats, some more usable than others (e.g. databases 
in proprietary file formats that couldn’t be read without their originating program; a folder containing a 
number of text files; and finally some SQL dumps that lacked relations). We decided to work with the SQL 
dumps, and try to reverse engineer the relations between tables. We also had Noyes' documentation for the 
database, initially written to aid student workers at the NHM with data input and database maintenance. 
These materials included an entity-relationship diagram, of sorts (Figure 1), as well as an extensive pdf 
describing the contents of each field in each table. We encountered some usual and expected rough spots 
with migration: despite the schema diagram and other documentation, we still needed to contact Noyes to 
clear up questions. The meaning of some table and field names were unclear -- some because of technical 
constraints (e.g. character limits to field names; field limits to tables) and some because of our inexperience 
with the database. In other cases, though it was because the actual structure of the database had been 
changed from its original design: Noyes’ database had “learned,” so to speak, to adapt to the changing 
conditions in his lab. 
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UCD Flowchart 
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Figure 1: Noyes’ “Flow Chart” describing his database’s original schema 


4 Discussion 


4.1 How this database learned 


In some ways, it’s insufficient to only refer to the UCD as a database — it’s also an extremely well-curated 
dataset. Noyes often had to rely on non-expert, unpaid volunteers for data entry, so he had to stringently 
checked all of their work before “accepting” it into the database. However, instead of adding additional 
fields into the primary set of tables, Noyes created proxy tables into which volunteers could enter their 
data. Noyes then would manually migrate these new records to the main set of tables. This double layering 
of tables isn’t reflected in the database’s schema (Figure 1) because it was a later addition to the database’s 
structure; we only realized what was going on after encountering seeming duplicates or versions of the same 
table (“ref” and “newref”) and contacting Noyes for clarification. 

After more thoroughly comparing Noyes “Flowchart” and the sql dump we were given, we realized 
that Noyes’ add-on construction had been quite extensive. Our database includes 34 tables, whereas Noyes’ 
original schema only contains 22. Again, after consulting with Noyes, we learned that he had “ingested” 
several other datasets into his over the course of the UCD’s lifespan, and had furthermore, begun using the 
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UCD for local data management of some related-but-separate projects, such as a table titled “crencyrt” 
which contains data from survey of Costa Rican Encyrtidae ranges otherwise unrelated to the rest of Noyes’ 
data aggregation efforts. 

As we implied through our literature review, much of the prior work in databases has been either 
ethnographic, and therefore difficult to generalize, or extremely mathematical -- only looking at database 
contents and change over time in terms of set theory and mathematical relationships. Here we want to show 
how sets and workplace practices affect each other over time, and Brand’s framing allows us to do just that: 
to study the interplay between engineering, culture, and the day-to-day getting on with it. In the case of 
the UCD, the changes to this database particularly lend themselves to Brand’s architectural metaphors: 
tables were “added on” like spare rooms to make room for an expanding “family” of users. Because Noyes 
so carefully curated his data, we did not find some of the quirks of long-term use that we have observed in 
our own prior work with databases, such as gradual change in the use of certain fields over time (the 


repurposing of a room, in Brand’s rendering), or of shearing of large tables into smaller subsections. 


5 Conclusion 


To borrow further from Brand, we believe that it is useful to view a database not just as a completed 
product, but as something that is in the process of change; the process of database construction is not 
confined to the period of its original design. As with houses, older databases contain evidence of how they 
were built, how they have changed, and sometimes even why. In the UCD, we see evidence of design 
processes that allow for the safe handling possibly erroneous data entry, as well as repurposing the database 
for a subproject. We were able to take advantage of various supplementary sources of evidence to inform 
this analysis in addition to the database itself. 

We believe that studies like this can give us a richer understanding of how longer lived databases 
subtly change over time, thereby informing not only their initial design and management, but their long- 
term maintenance and preservation as well. It is hardly controversial to argue that databases should last — 
but we want to ask, how do they survive, really — particularly now that our longest lived databases (those 
found in memory institutions like libraries and museums) are becoming increasingly electronic, and thus 
increasingly at risk for being treated as wholly mathematical entities — not the complex human created 
artifacts that they in fact are. Many present preservation techniques further put them at risk for being 
preserved as pristine, atemporal entities; rather like historic homes that have never changed from the 
moment that they were built, databases in preserved in this manner will be seen but not used. 
Understanding how databases learn will help us understand the features that contribute to databases’ long- 
term usefulness and usability over years, decades, and even centuries. 
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Abstract 

Public libraries around the world are adding collaborative creative spaces, often known as makerspaces, 
which facilitate hands-on activities with digital and electronic tools such as 3D printers and soldering 
irons, as well as more traditional tools, such as sewing machines and wood working materials. Many 
makerspaces incorporate art with STEM (Science, Technology, Engineering and Mathematics) to create 
a STEAM-charged participatory culture that encourages people who were not previously inclined to code 
or solder to interact with science and technology in ways they had not before. This poster synthesizes 
three qualitative studies, one of which is ongoing, using grounded theory methods to build a picture of 
makerspaces in rural public libraries from the perspective of the users and librarians. 


Keywords: makerspaces, public libraries, access, intellectual freedom 

Citation: Barniskis, S. C. (2014). STEAM: Science and Art Meet in Rural Library Makerspaces. In iConference 2014 Proceedings 
(p. 834-837). doi:10.9776/14158 

Copyright: Copyright is held by the author. 

Acknowledgements: Thanks to Dr. Joyce M. Latham for her input on this study. 


Contact: Crawfo55Quwm.edu 


1 Introduction 


Public libraries around the world are adding or considering the idea of collaborative creative spaces, often 
known as makerspaces. These spaces facilitate hands-on activities with digital and electronic tools such as 
3D printers and soldering irons, as well as more traditional tools, such as sewing machines and wood working 
materials. Many makerspaces incorporate art with STEM (Science, Technology, Engineering and 
Mathematics) to create a STEAM-charged participatory culture that encourages people who were not 
previously inclined to code or solder to interact with science and technology in ways they had not before. 
Only one study has explored public library makerspaces thus far (Slatter & Howard, 2013), and more 
exploration is needed to understand the uses and impacts of such spaces, as well as how they align with 
more traditional library services. 


2 Method 


This poster synthesizes three qualitative studies of public library creative spaces. Using Vaughan’s (1992) 
theory elaboration methods, each study builds on the other, though the units of analysis, participants, and 
research methods vary. The studies’ research questions overlap to explore different perspectives of the 
makerspace experience. 

The studies include: 


e An ongoing ethnographic exploration of how users of a rural library makerspace interact with tools 
and each other. This study asks how users ages 12 and older understand and use makerspaces, and 
explores their feelings of creativity, agency, and social capital. 

e An interview-based study of librarians in eight small Wisconsin communities explores their reasons 
for planning or adding makerspaces services. This study asks librarians how they position 
makerspace services in terms of intellectual freedom and access. 

e <A content analysis of six limited life histories (Barnhurst, 1998), of makers ages 12 and older who 
have used library makerspace equipment or participated in arts programs, describing their memories 
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of making things. This study asks how makers experience creativity and how they position public 
libraries in their creative lives. 


The data from each phase has been analyzed using Charmaz’s (2006) constructivist grounded theory 
methods. The results present an emerging view of libraries as innovative creation spaces, even in the smallest 
communities, and describe a mandate for facilitating participatory engagement with ideas and knowledge- 
building. 


3 The Literature 


The theoretical perspective of intellectual freedom and access to knowledge ground this study in Library 
and Information Studies (LIS), as well as in the disciplines of education, communication, and art, a few of 
which are highlighted here. Ribot and Peluso (2003) frame access as both the ability to interact with things 
as well as the social relationships and processes that allow a person to derive opportunity benefits. This is 
a positive perspective of intellectual freedom and access, the flip side of the coin of the negative justice 
concept that opposes censorship, as is operationalized in the ALA’s Library Bill of Rights. 

The digital divide literature also touches on the need for access that is based not only on equipment 
or bandwidth, but on the networks that make the use of digital tools possible (Chen, 2013; Warschauer, 
2002). These system-based concepts of access intersect with the systems theory of creativity 
(Csikszentmihalyi, 2009), participatory culture (Gauntlett, 2011; Jenkins, 2009), and the STEAM model of 
contextualized learning (Boy, 2013; Madden, et al, 2013). The concepts of library-as-place and social capital 
are relevant as well (e.g. Britton & Considine, 2012; Charmaraman, 2013; Varheim, 2009). 


4 The Themes 


The sorted data from the three studies falls into four broad categories (affordances, social interactions, 
instrumentality, and access), broken down here into twelve themes that show how the patrons, library staff, 
and the tools and space itself interact to co-construct the experience of public library makerspaces. 


4.1 Tools & Space 


DISPLACEMENT: Enabling new skills to displace old, by using digital tools to create 
AFFORDANCES: Erecting or opening boundaries around the possible uses of the space and tools 


4.2 Librarians 


ACCESS: Offering another way to access information, allying this access with the mission to ensure 
intellectual freedom 

ENGAGEMENT: Encouraging community, social, and creative engagement with library, each other, and 
technology 

RELEVANCE: Marketing library as technologically leading-edge, and demonstrating relevance in users’ 
lives 

INSTRUMENTALITY: Offering pathways to economically valuable skills, civic engagement, other goals 
“more important” than making 


4.3 Makers 


COLLABORATION: Sharing knowledge, revealing hidden talents, bringing together disparate groups and 
institutions 

SURPRISE: Finding the only outlet for creative activities outside one’s home in an unexpected place, 
enjoying a new perspective of library possibilities 

CREATIVITY: Describing a joy in the process of creating and sharing, which “makes life worth living” 


835 


iConference 2014 Shannon Crawford Barniskis 


MATERIALITY: Working with one’s hands, “turning ideas into physical things,” responding to a feeling 
of alienation from technology 

DISCOVERY: Learning new things that one would never have otherwise tried, from soldering LEDs to 
cooking new types of food 

AGENCY: Having power over the things and processes in one’s life, feeling able to impact one’s community 
and library 


5 Discussion of Key Findings 


5.1 Access 


Participant in all three studies framed their interaction with the makerspace in terms of access. Patrons 
say they could not otherwise access the tools or community they enjoy, without the library providing it 
freely. One maker described the distinction as providing access not to the old “grocery store” model of 


p 


libraries, but as a “kitchen,” where knowledge was as readily created as consumed. Librarians considered 
the provision of access to knowledge as an intellectual freedom and social justice issue. Instead of leveraging 
the intellectual freedom mission to oppose censorship and ensure access, these librarians actively facilitate 
access to tools and knowledge as a social justice issue. This finding has implications for LIS scholarship and 


policy. 


5.2 Social Spaces 


All participants described the makerspace as a social space where makers could share the discovery process, 
support each other, and engage socially. Residents of small rural communities described few opportunities 
to gather, to learn from one another, or to work together. Makerspace tools, while interesting and exciting, 
were often considered less important to the act of collaborative creation and knowledge-building. This 
finding has implications for funders or policy-makers who might otherwise focus on the technology instead 
of the social interactions at the heart of these spaces. 


5.3. Instrumentality 


Some librarians spoke to the educational or economic impacts of offering high-tech makerspaces, including 
luring businesses in with opportunities to prototype designs, marketing the library as tech-savvy, or 
attracting people that would never enter the library otherwise. In fact, patrons expressed surprised pleasure 
in the role of the library as creation space. Some only came to the library to use the makerspace. While 
patrons also noted that interacting with the makerspace could allow others to learn useful skills for jobs, 
they focused more on their enjoyment in their personal use of the space. This finding describes the 
alignments and discontinuities of how stakeholders envision the impacts of the library differently. 


5.4 Diving in 

Study participants described an exhilarating sense of “diving in” and trying things outside their comfort 
zones, along with an attendant sense of agency and power. Users, especially women, who participated in a 
soldering workshop, expressed a new empowered relationship to electronics. That said, not everyone engaged 
with the spaces equally, with females preferring to add new technology-based skills by incorporating them 
into more traditional craft skills, as with e-textiles. 

The affordances of the spaces interact to allow some activities while prohibiting others. For example, 
the use of a makerspace by one age group may preclude use by other age groups, and privileging high-tech 
tools may marginalize those who prefer to use more familiar technologies, while rules about quiet or mess 
may act as barriers to engagement. Librarians played a critical enzymatic role in cultivating a culture of 
play, intergenerational sharing, and willingness to fail, which helped patrons who were more hesitant to try 
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new things. This finding offers insight for practitioners as they plan and implement their spaces, especially 
in terms of promoting participatory culture in the library. 


6 Conclusion 


A librarian said, 


Magic is what the library needs to do ... Neil Gaiman [said] ... if he could write something on the 
wall of every library ... he would just, in the children’s section, write “And then what happened?” 
He was talking about that magic of the imagination and the spark. I think that having something 


by 


material really makes it magic. It’s less like “Yeah, Pm imagining something,” and that’s great, 


than “I have this thing in my hand now that wasn’t there 20 minutes ago.” 


The initial results of these studies show that when small libraries offer potentially disruptive technologies 
such as 3D printing, magic can happen. However, makerspace tools alone are not enough. Librarians play 
a key role in creating a culture of engagement with the makerspaces and between patrons. The affordances 
of the space, policies, tools, and programs encourage users to try the tools of production and participatory 
culture to which they would otherwise have no access or, sometimes, interest. These tools need not be 


expensive or high-tech to support access to knowledge. 
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Abstract 

The purpose of this poster is to present how the space interpretation, as a cultural dimension, influences 
the students’ usage of social network sites (SNSs) in computer-mediated collaboration at the University 
of Edinburgh. The results found that the private and public concepts were the core factors of space 
interpretation. Two primary social network sites, Facebook and Blog, were categorized into private and 
public space by students for different purposes in computer-mediated collaboration. Blog mainly 
represented as an exhibited space; Facebook was applied as a private space for sharing information and 
communicating, which also indicated the transformation of personal space cognition. Moreover, the space 
interpretation of SNS Facebook was influenced by different level of cultural diversity or homogeneity. 
The distance provided by Facebook was not close enough for a cultural diversity group to disclose their 
opinions during the collaboration. The study suggests that the cultural dimension can enhance the design 
of SNS for the customization and content control in improving information sharing and communicating 
which can optimize the usage of SNS for computer-mediated collaboration. 


Keywords: computer-mediated collaboration, social network site, cultural perspective 

Citation: Huang, H.-Y. (2014). Space Interpretation of Social Network Site from Cultural Perspective: A Case Study of Computer- 
mediated Collaboration in Digital Media Studio Project. In iConference 2014 Proceedings (p. 838-843). doi:10.9776/14254 
Copyright: Copyright is held by the author. 

Contact: hhsiaoying@gmail.com 


1 Introduction 


With the rapid development of Web 2.0, computer-mediated collaboration has been a critical research field. 
Prior studies focused on group cohesion and optimized collaboration of online collaboration (Fussell, Kraut, 
Lerch, Scherlis, McNally & Cadiz, 1998; Lou, Abrami & D’Apollonia, 2001). Recently, social and 
psychological aspects, such as perception of group belonging and trusting, also call for attentions because 
of their essentialness for effective learning and collaboration (Kreijns, Kirschner & Jochems, 2003; Kirkman, 
Rosen, Gibson, Tesluk & McPherson, 2002; Kirkman, Rosen, Tesluk & Gibson, 2006; Paul & McDaniel, 
2004). Researchers also indicated certain conundrums to influence groups’ performance, such as coordination 
complications, social loafing in virtual groups, establishing and sustaining social interaction that depends 
on the trust or sense of belonging (Kreijns et al., 2003). On the other hand, these conundrums could bring 
more meaningful cognitive processes through individual interpretation like not taking critique as a personal 
attack (Weinel, Bannert, Zumbach, Ulrich Hoppe & Malzahn, 2011). 

As researchers pointed out (Fulk, Steinfield & Schmitz, 1987), information richness of media will 
affect collaboration through individuals’ interpretation. In terms of computer-mediated collaboration, the 
lake of face-to-face interactions resulting in individual distrust (Hill, Bartol, Tesluk & Langa, 2009) has 
been an important issue for researchers and practitioners. Several studies have suggested combining 
computer-mediated communication with FTF communication for improving the effectiveness of computer- 
mediated work (Duarte & Snyder, 2001, Kirkman et al., 2002, Lipnack & Stamps, 2000). Although these 
studies have provided critical factors influencing the collaborative behaviors in computer-mediated groups, 
few studies examine the influence of individual’s interpretation of space in computer-mediated environment. 
The reason for taking space is an important factor because the concept of space influences an individual’s 
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behaviors and it can be regarded as cultural dimension (Hall & Hall, 1990). Therefore, space could be useful 
approach to explore the attitudes and behaviors of media usage especially in a multicultural computer- 
mediated collaboration. To fill the gap, this present study adopted cultural dimension to investigate how 
the interpretation of space affect the usage of social network site (SNS) in computer-mediated collaboration 
of digital animation group in Digital Media Studio Project at the University of Edinburgh. 


2 Space as Culturally Analytic Dimension 


As the way of communication through personal interpretation (Hall & Hall, 1990), space has been studied 
as cultural dimension of a society for decades. Researchers (Fulk et. al., 1987) also indicated that space, as 
social environment, provides important social information through different types of communication among 
coworkers. That is, space which could be an online or offline setting may be affected by individual’s 
interpretation and directly or indirectly influence collaboration. Therefore, investigating the usage of social 
network site from cultural dimension of space is meaningful for the development of computer-mediated 
collaboration especially in the cross-cultural environment. 


3 Research Method 


This study employed case study with participant observation, individual interviews and online questionnaire 
during February to May, 2012. Participants were students involved in Digital Media Studio Project (DMSP) 
at the University of Edinburgh. Students were major in digital animation. There were twelve students 
(N=12) in animation group divided into two small groups. Each group had six members: four animators, 
one sound maker and one musician. Group members came from different cultures. The task assigned by 
supervisors for students was to make a digital animation and create a website to document their design 
process. Two supervisors, as assistants, only provided advices for animation. 

Students communicated via different conduits. Face-to-face and computer-mediated 
communication, such as email or SNS, was adopted during collaboration. Both of them chose Facebook as 
their communicative platform. To document working process, both groups created website and Blog to 
display their development of works. 

There were totally twelve students; seven students completed online questionnaire, and nine 
students accepted individual interviews. The findings were based on students’ group discussion, texts on 
SNS, individual interview and online questionnaire. 


4 Private Facebook and Public Blog in CMC 


The notion of private and public would influence the space interpretation of SNS. This study found that 
Blog was defined for “formal”, and Facebook was for “non-formal”. That is, Facebook was regarded as a 


private space where any idea and opinion can be discussed rather than Blog. 


4.1 Facebook: A Communicating and Sharing Information Space 


Students formed Facebook group as an online limited-accessing space which was much more private than 
Blog. Students discussed everything on Facebook including meeting time, working process report, providing 
opinions and suggestions for others’ works, negotiating and collaborating issues. The texts on Facebook 
were based on the way of speaking. The word such as “cool”, “great”, “ha” appeared often especially in 
response and the grammar might not be examined. The post on Facebook is a system of dialogue for the 
purpose of communication. Facebook group may not be a perfectly private space but, for students, it 
actually provided the private control to isolate the non-relevant users which made coworkers focus on 


communicating. 
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4.1.1 Blog: an exhibited space 


Blog, contrast to Facebook, was defined as a public and official space. Students exhibited more completed 
works and less opinion on Blog. They did not communicate the working details or pass personal messages 
on Blog. Blog is a public field watched by supervisors, peers and anonymous viewers whereby students 
behaved carefully and conservatively when publishing on this “public” space. Most Blog posts displayed the 
progress of animation rather than the individual works. “What have done” was the primary purpose for 
Blog posting. 


4.2 Blog is “We”; Facebook is “I” 


The results found that the interpretation of space in SNS could affect the identity of group and individual. 
The difference to distinguish Facebook from Blog is that individuals shifted their stand point from the 
group to the personal. For example, a student Hank, as a composer, wrote the message about how he 
arranged music as following: 


I added the opening music to Dropbox folder entitled Music. I've timed the little piccolo runs at 
the start to start and stop in time with when the box moves. I've taken some liberties with the 
timing of the rest of it to make that section last a bit longer. I had the idea that when the violin 
slide starts, that's when the camera would pan over slightly to the box... 


On the contrary, the expression post by Hank on Blog exhibited group work rather than the individual. 
The post is as following: 


We discussed as a group that the opening sequence could be longer, so I left gaps in between these 
lines. ... The happy/heroic feeling is quickly snatched away in the final cadence as the next scene is 
back inside the box. I figured by ending it on a slightly darker tone, it gives the impression that 
something sinister is in the box. So when we see it’s a little caterpillar, it seems pretty funny... 


He focused on the music of animation. The word “I” in here was an objective related to work for animation. 
When mentioning the decision about the extension of opening music, student not only used “we” as subject 
but also emphasized “discussed as a group”. He stressed that is a group decision not by himself. He put the 
“T” under the “we” that appeared as an entity on Blog. 

As for Facebook, Hank adopted more “I” to express his ideas about animation. The sentences were 
informal and uncompleted. Moreover, he made the decision by himself and explained the reason on 
Facebook. He uploaded the work but did not ask the opinions; he still focused on the animation but not 
emphasized on “we” or “group”. He spoke from a personal angle. The “we” in above paragraph represented 
the role of viewers rather than a group. That is, if a space is more private and without authority, the 
consciousness of entity would become implicitly and individuals’ opinions would be explicit. Comparing to 
Blog, Facebook could be seen as a privately public space (Lange, 2007). 


5 The Cultural Influence of Communicating Distance on Facebook 


Although both groups used Facebook as privately communicating platform, the results found that 
distinctions of SNS usage existed among students with different cultural background. Group A was 
comprised mainly by students from the same background which represented cultural homogeneity; group B 
was in opposite situation with cultural diversity. The topics discussed on Facebook of group A included 
reporting working process, sharing ideas, negotiating the division of works or commenting on others works. 
On the contrary, the posts by group B were related to the arrangement of time meeting, working report or 
ideas sharing. The comments or opinions seldom performed on Facebook. When asked why no discussing 
on Facebook, an Asian girl in group B answered: 
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We are not used to talk or discuss things on Facebook. The Westerns do this more often than us. 
We like to talk face-to-face. 


The only one girl with the European background in group B, reversely, reported that she got used to 
communicate the work details on Facebook. The results suggested different interpretations of the space on 
Facebook influenced by cultural background, which led students adopted different communicating strategy. 

From cultural perspective, certain Asian cultures, such as Korea and China, incline to communicate 
in the close distance due to the collectivism (Kim, Sohn, & Choi, 2011). Moreover, in terms of the Internet 
usage, researchers indicated that the Asians put more emphasis on social interaction rather than information 
seeking (Chau, Cole, Massey, Montoya-Weiss, & O’Keefe, 2002). However, the Asian students in group B 
did not interact more online. Apparently, for group B, the distance provided on Facebook was not close 
enough to share the important information, but only for reporting and arranging schedule. The highly 
cultural diversity existed in Asia was reflected on group B which Facebook could not correspond to the 
need of sustaining relationships. On the contrary, although the student in group A from different countries, 
most of them shared the European culture so that they could adopt the similar cultural context which had 
the distance acceptable for them to discuss on Facebook. 

The results illustrate that the group with high cultural homogeneity can share the close space 
interpretation to facilitate the collaboration through social network sites. For computer-mediated 
collaboration, the distance in virtual space could be enlarged by the degree of cultural diversity that may 
influence personal information behaviour. Therefore, as social network sites become a collaborative venue 
which afford different levels of cultural diversity, a specific space for collaboration needs to be separated 
and redesigned the information structure to diminish the cultural influences in order to enhance the quality 
of collaboration. 


Public 


Individual Group 


Private 


Figure 1: Space Interpretation of Blog in Computer-mediated Collaboration 
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Figure 2: Space Interpretation of Facebook in Computer-mediated Collaboration 


6 Conclusions 


This study suggests that the usage of SNS could be influenced by the space interpretation in computer- 
mediated collaboration. Private and public space, as a dimension measured by students when interpreting 
the space, is actually the ramification of the identity. The identity includes the concept of “I” and “We”, 
or individual and group. However, the dichotomy cannot be either simple or straightforward especially in 
the context of computer-mediated communication (West, Lewis, & Currie, 2009). 

The results indicate the design implications of SNS which is employed as the supportive tool in 
computer-mediated collaboration. The study found that the level of cultural diversity of a group inclined 
to enlarge the interpretation of space which affected the communication and collaboration on the SNS. In 
other words, the current SNS still needs to overcome cultural boundaries and develop the cultural trust 
which will influence computer-mediated collaboration. As groups or organizations increasingly supplement 
their collaborations with SNS, it is important to be aware of cultural touch points, where the interpretation 
of space may become the limitation to meet different individuals’ and groups’ social needs. The more 
customization and content control in improving information sharing and communicating could help to 
optimize SNS usage for computer-mediated collaboration. 

As for the limitation of this study, supervisors’ suggestions might affect students’ attitudes toward 
the usage of SNS. The involvement of supervisors made Blog become a more formal and serious space which 
influenced students’ performances, whereby the interpretation of space to Facebook and Blog was polarized. 
For future study, the core issue is to map an elaborative cultural dimension as an analytic indicator to 
evaluate and improve the information and interface design of social network site for computer-mediated 
collaboration. 
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Abstract 

This exploratory research studies microblog users’ collective sensemaking process in response to a 
public health threat — the developing H7N9 flu pandemic in China. We examined question-bearing 
microblogs and responses during the course of 31 days after the first confirmed cases were reported. We 
found that: 1) different gaps were present during the 5 uncertainty phrases; 2) many topical gaps 
appear at both collective and individual levels; 3) users employed multiple approaches to deal with and 
bridge the gaps. The results suggested that users used social media as a platform for collective 
sensemaking and collaboration among different types of users may help to best utilize social media as a 
platform for reducing uncertainty and promoting information openness and transparency in such cases. 
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1 Introduction 
The recent development of social media has greatly changed the way people seek and make sense of 
information. On one hand, social media allows citizens to produce and disseminate information to 
decentralize the control of traditional official sources. On the other hand, unverified information, rumors, 
and spams can make it difficult for people to make sense of large amounts of microblogs. Users participated 
in the collective sensemaking process by reading, re-posting (retweeting), and commenting on microblog 
posts. Sometimes, users reach a collective understanding (outcome of sensemaking); sometimes they discover 
more gaps or confusions. 

This paper examined the use of Weibo (one of the top 3 microblogging platforms in China) in 
response to the 2013 H7N9 flu pandemic. We applied Dervin’s gap identification and bridging metaphor 
(Dervin, 1992, 1998) to examine the gaps in knowledge of individual users and collectively as a group. The 


research questions are: 


1) What are the gaps that social media users identified during the first month of the H7N9 flu? 
2) How do users identify and deal with the gaps, individually and collectively? 


2 Related Research 


2.1 Sensemaking 


Sensemaking research seems to agree that the sensemaking process is iterative where the sensemaker goes 
through several rounds of “gap identification — gap bridging” (Dervin, 1992, 1998), “information foraging — 
sensemaking” (Pirolli & Card, 2005), or “sensing — making sense” (Weick, 1995) to reach an understanding 
to base action upon (Russell et al., 1993; Stefik et al., 2002; Dervin & Naumer, 2010). Different types of 
information are sought at different stages of information seeking and use (Kuhlthau, 1993), perhaps because 
different gaps in knowledge are encountered. Background information, for example, is often sought at the 
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opening stage when users were engaged in problem definition and picture building (White, 1975; Kuhlthau, 
1991). Collaborative sensemaking research examined how individuals together find and organize information 
at both individual and group levels, and the interactions between and among individuals seem to be critical 
(Paul & Reddy, 2010; Heverin & Zach, 2011). 


2.2 Emergency Responses 


There is a recent increase in research on the use of social media in response to emergent situations, including 
natural disasters such as wild fires (Sutton, Palen, & Shklovski, 2008; Vieweg et al., 2010) and earthquakes 
(Qu et. al, 2011), or public safety threats such as school shootings (Vieweg, 2008; Heverin & Zach, 2011). 
Chew (2010) found that during the 2009 H1N1 outbreak, Twitter was used to disseminate information from 
credible sources to the public, as well as a platform to share personal experiences and opinions. Social media 
can be used to detect real or perceived concerns of the public to allow officials to be aware and respond 
accordingly in these emergent situations. 


2.3 Questions in Online Social Networks 


Many gaps in knowledge appear in the form of questions. Questions and questioning seem to be effective in 
requesting for information in online social networks (Zhang, 2012). Both factual and conversational 
questions seen in social media are important for sensemaking: factual questions solicit answers that fill in 
knowledge gaps, whereas conversational questions carry on discussions (Harper, Moy, & Konstan, 2009; 
Efron & Winget, 2010). Relatively less is known about how questions in online social networks contribute 
to the sensemaking of individuals and groups. 


3 Methods 


Data collection. We collected the microblog posts using a keyword search “H7N9 | bird flu (A UR)” 
posted during the course of a month after March 31, 2013 when Chinese Center for Disease Control and 
Prevention (CDC) first reported 2 confirmed H7N9 cases. Microblogs with a question mark, or have certain 
keywords that are likely to be questions (Li & Thompson, 1989) were selected for content analysis. The 
data set contains 1484 question-bearing microblogs, their 4454 re-posts (retweets), and 4043 responses. 

Data analysis. We first examined the overall trend of uncertainty levels in association with number 
of H7N9 related posts and number of reported H7N9 cases and divided the whole process into 5 phases. We 
coded the question-bearing microblogs and their re-posts and responses according to the following three 
dimensions: 


1) Topic: open coding; 
2) Types of gaps: individual, collective, or both; 
3) Approaches for dealing with gaps: open coding. 


4 Results 


4.1 Overall Uncertainty Trend 


We divided the course of 31 days into 5 uncertainty phases based on the percentage of questions raised and 
the content of the posts (Figure 1). 
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Figure 1: Uncertainty Phrases during 31 days after the Official H7N9 Report 


Phase 1. Very high uncertainty: March 31 to April 2. This was the first time the H7N9 virus 
was detected in humans and people knew very little about it. 

Phase 2. High uncertainty: April 3 to April 8. CDC reported more infected cases with a much 
higher mortality rate than previous flu pandemics. This phrase sees the highest number of posts, 
sharing information, asking questions, spreading rumors, and so on. 

Phase 3. Decreased uncertainty: April 9 to April 12, the growth rate of the reported case was 
steady, more information was released about the reported cases, possible causes, prevention, 
treatments, and so on. 

Phase 4. Medium uncertainty: April 13 to April 20, a slightly higher growth rate and two media 
reports (a. 2 cases were confirmed in Beijing; b. WHO does not rule out the possibility of human- 
to-human transmission) resulted in medium uncertainty and increased number of H7N9 related 
posts. 

Phase 5. Low uncertainty: April 21 to April 30, a steady increase rate with no evidence of human- 
to-human transmission further lowered the uncertainty level. On April 24, the CDC decided to 
change the daily information release to a weekly release. 


Types of Gaps 


We observed 2 types of gaps depending whether the concern is collective or individual. For example, at the 


early phrases with very high and high levels of uncertainty, a very common gap in knowledge is “what is 


HYN9?”” where little was known about this new virus at both collective and individual levels. Sometimes 


the same topical gap may appear in different forms at collective and individual levels. For example, the 


general public would like to know what symptoms the reported cases showed, whereas individuals with flu- 


like symptoms often ask, “am I infected with H7N9?” Table 1 shows example gaps raised at the collective 


and individual levels on different topics and sub topics, and the phases they appear in. 
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Topic Sub-topic Collective Individual Phases 
What is What is H7N9? 1 
What is the cause of the H7N9 | Can I eat chicken, duck, or 
Cause 1,2 
flu? eggs? 
What are the s toms of the 
Symptom ee gee Am I infected with H7N9? 2 
= H7N9 flu? 
3 
ia What is the tr issi 
> ee ne ere How can I avoid getting the 
D o. channel (esp. human-to- 2 
Z, Transmission flu? 
~= , human)? 
a. / Prevention - 
When will a vaccine be 
Can I get a flu shot? 2,4 
developed? 
Should I be tested for the vi 
Treatment / | What drugs are used for E : e - P Uag 
. at a hospital if I have flu-like 2,3,4,5 
Cure treating the flu? 
symptoms? 
Why the official CDC report was 20 days late than the infections 23 
Information were confirmed? , 
oO 
a release and What is the number of infected 
8 i How close are the cases to 
œ openness cases? Who are they? Is their : 2,3,4,5 
2 : where I live? 
x close contacts infected? 
1S) è è A 
5 Ta E EA aa Should I leave the infected 
z pandemic like the 2003 SARS 3 2,3 
areas? 
g Flu pandemic | or the H1N1 swine flu? 
Is the pandemic over? 3,4,5 
What’s the emergency response plan? 2 
c Is the change in stock market Is our meeting cancelled 2345 
onsequence 
pees related to the flu? because of the flu? ee 


Table 1: Example Gaps at Collective and Individual Levels 


4.3 Identification and Dealing of Gaps 
Posting a microblog with a question about the gap is a very common way to identify a gap either at an 
individual level or at the collective level. The question may be posted to the user’s entire social network, or 
may be directed to some individuals (Evans, Kairam, & Pirolli, 2010). 
Questions, especially targeted questions, often initiated conversations that help the participants deal with 
the gaps collectively. 

Reposting (retweeting) a microblog with questions added to the original post is another way of 


identifying gaps. For example, when the CDC report first came out, a relatively influential user (QRO, a 


CEO of a Tech Company) retweeted an official news report from a news channel (QU 4#fL1), and 
commented “Why is the official CDC report more than 20 days late than the 2 fatal cases are confirmed?” 
This question is retweeted by several other influential users (some with millions of followers), identifying a 
huge gap in public knowledge. 

When dealing with the gaps in public knowledge or at an individual level, we observed 4 main types 


of responses: 


1) No action 
2) Seek for answers from trusted official sources, such as government media or CDC and WHO reports, 


then post answers citing these sources if the search and sensemaking is successful 
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3) Seek for answers from close social network and friends, usually do not post answers or updates when 
sensemaking is done 

4) In expecting questions, post information voluntary about the general knowledge of H7N9 Virus and 
cases reported; users in the healthcare profession seem to do this more often. 


5 Conclusion 


This is an exploratory analysis to look into the collective sensemaking process through microblog posts 
during the first month of the H7N9 flu pandemic. Microblogging site (Weibo) seems to be a useful platform 
for users to identify and bridge gaps related to the developing H7N9 flu pandemic. Similar to Chew (2010), 
official sources, experiences from close friends, and opinions healthcare professionals are all found to be 
useful resources for individual and group’s sensemaking process. Research show that a loosely connected 
group of people can work together to come to an understanding of a situation (Vieweg et al., 2008); the 
connectivity and fast information flow in social media may be able to further facilitate this collective 
sensemaking process. The collaboration among different types of users may help to best utilize social media 
as a platform for reducing uncertainty and promoting information openness and transparency in such 
emergent cases. 


6 References 


Dervin, B. (1992). From the mind's eye of the user: the sense-making qualitative-quantitative 
methodology. In J.D. Glazier & R.R. Powell (Eds.), Qualitative research in information 
management (pp. 64-81). Englewood: Libraries Unlimited. 

Dervin, B. (1998). Sense-Making theory and practice: An overview of user interests in knowledge seeking 
and use. Journal of Knowledge Management, 2(2), 36-46. 

Dervin, B., & Naumer, C. M. (2010). Sense-making. In M.J. Bates & M.N. Maac (Eds.), Encyclopedia of 
library and information sciences (3rd ed., pp 4696-4707). Boca Raton, FL: Taylor and Francis. 

Chew, C., & Eysenbach, G. (2010). Pandemics in the age of Twitter: content analysis of Tweets during 
the 2009 H1N1 outbreak. PloS One, 5(11), e14118. 

Efron, M., & Winget, M. (2010). Questions are content: a taxonomy of questions in a microblogging 
environment. Paper presented at the Proceedings of the 73rd ASIS\&T Annual Meeting on 
Navigating Streams in an Information Ecosystem - Volume 47, Pittsburgh, Pennsylvania. 

Heverin, T., & Zach, L. (2011). Use of microblogging for collective sense-making during violent crises: A 
study of three campus shootings. Journal of the American Society for Information Science and 
Technology, 63(1), 34-47. 

Kuhlthau, C.C. (1991). Inside the search process: Information seeking from the user's perspective. Journal 
of the American Society for Information Science and Technology, 42, 361-371. 

Kuhlthau, C. C. (1993). Implementing a process approach to information skills: A study identifying 
indicators of success in library media programs. School Library Media Quarterly, 22(1), 11-18. 

Li, C. N., & Thompson, S. A. (1989). Mandarin Chinese: A Functional Reference Grammar: University of 
California Press. 

Paul, S.A., & Reddy, M.C. (2010). Understanding together: sensemaking in collaborative information 
seeking. Paper presented at the Proceedings of the 2010 ACM conference on Computer supported 
cooperative work, Savannah, Georgia, USA. 

Pirolli, P., & Card, S. (2005). The sensemaking process and leverage points for analyst technology as 
identified through cognitive task analysis. In Proceedings of International Conference on 
Intelligence Analysis (Vol. 5, pp. 2-4). 


848 


iConference 2014 Pengyi Zhang & Duohuai Gao 


Qu, Y., Huang, C., Zhang, P., & Zhang, J. (2011). Microblogging after a major disaster in China: a case 
study of the 2010 Yushu earthquake. In Proceedings of the ACM 2011 conference on Computer 
supported cooperative work (pp. 25-34). ACM. 

Russell, D.M., Stefik, M.J., Pirolli, P., & Card, S.K. (1993). The cost structure of sensemaking, 
Proceedings of the INTERACT '93 and CHI '93 conference on Human factors in computing 
systems. Amsterdam, The Netherlands: ACM Press. 

Stefik, M.J., Baldonado, M.Q.W., Bobrow, D., Card, S., Everett, J., Lavendel, G., et al. (1999). The 
knowledge sharing challenge: The sensemaking white paper: PARC, Inc. 

Sutton, J., Palen, L., & Shklovski, I. Backchannels on the Front Lines: Emergent Use of Social Media in 
the 2007 Southern California Fires. Proc. ISCRAM, (2008). 

Vieweg, S., Hughes, A. L., Starbird, K., & Palen, L. (2010). Microblogging during two natural hazards 
events: what twitter may contribute to situational awareness. In Proceedings of CHI '10. 1079- 
1088. 

Vieweg, S., Palen, L., Liu, S. B., Hughes, A. L., & Sutton, J. (2008). Collective intelligence in disaster: An 
examination of the phenomenon in the aftermath of the 2007 Virginia Tech shootings. In 
Proceedings of the Information Systems for Crisis Response and Management Conference 
(ISCRAM). 

Weick, K.E. (1995). Sensemaking in organizations (3rd ed.). Sage Publications, Incorporated. White, 
M.D. (1975). The Communications behavior of academic economists in research phases. Library 
Quarterly, 45(4), 337-354. 

Zhang, P. (2012). Information seeking through microblog questions: The impact of social capital and 
relationships. Proceedings of the American Society for Information Science and Technology, 49(1). 


7 Table of Figures 
Figure 1: Uncertainty Phrases during 31 days after the Official H7N9 Report ..............ceeececeeeeeeeees 846 


8 Table of Tables 
Table 1: Example Gaps at Collective and Individual Levels 00.0.0... eeleceeeeeeeeeeeeeeeeeeeeeeeeeeeseeeseeeeeeeeeeeeeeeees 847 


849 


Textual Directions and Cognitive Workload 


Cristina Robles Bahm!' and Stephen C. Hirtle! 
1 University of Pittsburgh 


Abstract 

This project examines and compares the inferred cognitive workload of detailed and non-detailed textual 
directions in a navigation task. A user study was conducted where participants navigated through two 
virtual worlds, one urban and one rural, while following detailed and concise sets of textual directions. 
While navigating, a secondary task measure was used to infer cognitive workload. It was found that 
although there is no statistical difference between the detailed and non-detailed directions in both 
environments, there was a difference between the measured cognitive workload and the perceived 
cognitive workload on the rural map. A trend was also present on one of the maps that showed detailed 
directions in a simple environment may be redundant. It is important to know how many cognitive 
resources are allocated when performing a navigation task because it gives insight into how automatically 
generated directions, in systems such as GPS, should be disseminated to users. It also gives insight into 
how to communicate spatial information in general. 
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1 Introduction 

Wayfinding is a complex task involving orienting oneself in space, identifying decision points, estimating 
distances, and tracking location (Allen, 1997) where each task requires cognitive resources (resources). 
Wayfinding as a movement task also makes it necessary for a person to process and take in additional 
spatial, visual, and textual information (Tversky, 1993). It is important to know how many cognitive 
resources are allocated when performing a navigation task because it gives insight into how automatically 
generated directions such as, in systems such as GPS, should be disseminated to users. It also gives insight 
into how to communicate spatial information in general. 

Each person possesses a limited amount of resources which are allocated to a variety of tasks 
(Kahneman, 1973). For instance, when driving while using a GPS, one will naturally split resources between 
following the given directions and driving. Resources are allocated to each task depending on how many 
the task requires and how many are available. Regardless of how many tasks are chosen to allocate resources 
to, the total amount of available resources will remain approximately the same (Chewar & McCrickard, 
2002). This is significant because there are only enough resources available at one time to be able to process 
so much information at once. Thus it is better to design directions that take a lesser amount of resources 
away from the navigation portion of a wayfinding task. 

It is generally accepted that the more concisely textual directions are written the lesser amount of 
resources used for the navigation task (Marcus, et al., 1996). The lesser the amount of text to process, the 
lesser the amount of resources used. This would be a practical summary if wayfinding wasn’t the complex 
task described above. By taking into account Kahneman’s model of attention and effort, this project begins 
to explore the influence of longer directions on both the perceived and implied cognitive workload. 

Kahneman's model of attention and effort provides a framework to begin discussing cognitive 
resources. According to Kahneman’s model, the amount of resources available is limited, but how they are 
allocated is flexible. 
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The cognitive aspect in the communication of navigational information has been studied from many 
points of view. From attention allocation while using mobile phones (Patten et. al, 2003) to the best method 
of map presentation to a user (Rouben & Terveen, 2007), this is a well-studied area of cognitive psychology. 
Particular to spatial cognition, the way human beings produce and comprehend route directions when 
speaking to each other has also been studied (Allen, 1997). In the spatial communication community there 
has also been a focus on defining a data structure for route directions specifically in urban environments. 
A framework was developed that aimed to use urban knowledge in a way that would be able to be applied 
by the cognitive system (Klippel et. al., 2009). Lastly, the definition of what “good” directions are has been 
explored (Lovelace et. al., 1999), but still remains unanswered. 

In order to examine the interplay between cognitive workload and the detail level of textual 
directions, a user study was conducted where participants followed detailed and non-detailed directions in 
two virtual worlds (maps) while performing a secondary task. Participants were asked to navigate to a 
goal while counting backwards by two. The counting task served as a secondary task which allowed the 
researchers to infer a measurement of cognitive workload performance during the navigation task. 
Participants explored both a city environment and a rural environment. When done, their perceptions of 
the tasks were measured through a short survey. 

This project used a verbal secondary task because it interrupted the navigational experience enough 
for the participant to consider the task to be difficult, but still allowed them to focus on the navigation 
itself. The verbal secondary task chosen was to count backwards aloud in sets of twos from 4000 for the 
first navigation task then from 2000 for the second navigation task. The assumption is that the more 
resources the participant used while navigating, the fewer resources available to count backwards. Shorter 
sequences may indicate fewer resources available. 

A t-test on each map, urban and rural, shows no significant difference between the completion times 
for each detail level. A ttest for the non-detailed and detailed directions for the urban map measures 
t(.244), p = .810 and for the rural map #(1.921), p = .071. There is also no difference in the number of 
times a participant was “lost” during the navigation as well as the length of the secondary task sequence 
for each detail level on both maps. 

Although not statistically significant, there is a trend present in the rural map, as shown in the p- 
values above. The trend shows there might be difference in the completion times between detailed and 
non-detailed directions. This is likely because in the lesser complex rural environment having detailed 
directions is redundant. Overall, these results are interesting because they show that it does not take more 
time for participants to navigate an environment using detailed directions, even though most of the detailed 
directions were three times longer than their non-detailed counterparts. 

The survey administered to participants had two types of questions, two Likert scale questions and 
four free form questions. The response to the Likert scale questions “How satisfied are you with these 
directions?” and “How hard were you working following these directions?” ranged from one through five 
with one being the lowest measurement and five being the highest. On the rural map, regardless of the fact 
that performance on the secondary task showed no difference, a comparison of the means of the Likert scale 
questions showed that participants thought the detailed directions made them work harder. 


2 Conclusion 


First, the results show there is no difference between the inferred cognitive workload on each map or the 
number of times a participant got “lost.” For the completion time of the navigation, there was no difference 
between the detailed and concise directions on the urban map. For the rural map, there was a trend present 
that showed there could be a difference, but with a p-value of .07 further testing is needed to determine the 
significance of this. 
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Second, the performance of participants did not show a difference between detailed and concise 
direction sets, but the survey data from the experiment told a different story. On the rural map, regardless 
of the fact that performance on the secondary task showed no difference, participants answered on the 
survey that the detailed directions made them work harder. One possible hypothesis for this difference is 
the fact that the rural and urban maps were different environments. The lack of visual information in the 
rural environment may have made the extra navigational information in the detailed directions less useful, 
and therefore unnecessary in the perception of the participants. When navigating in a more complex 
environment such as the urban map the extra navigational information may have been more useful and 
therefore not perceived to be unnecessary or an extra cognitive burden. 

Third, regardless of the secondary task performance and answers to the previous Likert scale 
questions when asked to answer the free form questions “Which directions were the hardest for you to 
follow? Why?”, “Which directions made you work the most?”, and “Which task did you think was the 
hardest? Why?” 14 out of 19 participants stated that the detailed directions made them work harder than 
the non-detailed directions. A possible hypothesis for this difference could be a key difference between the 
inferred cognitive workload of a navigational task versus the perceived cognitive workload. 
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Abstract 

Universal access is the objective of digital library development. However, it is a challenge for blind users 
to search information effectively in digital libraries because of their dynamic design and multimedia 
collections. Serving as the preliminary study of a large scale project, this study focuses on the 
identification of types of help-seeking situations unique to blind users at the cognitive level. Based on the 
analysis of 15 blind users’ pre-questionnaires, pre-interviews, think-aloud protocols, transaction logs and 
post-interviews, the authors identified blind users’ typical help-seeking situations in relation to cognitive 
overload, comprehension and reasoning. Implications for how to design better help features for blind users 
to overcome these situations are also discussed. 
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1 Introduction 

Blind users interact with information retrieval (IR) systems, including digital libraries (DLs), in entirely 
different ways from sighted users. In this study, "blind users" refers to individuals who lack the functional 
sight to see information presented on a computer screen. They predominantly rely on text-to-speech software 
called screen-readers (SRs) to interact with computers and the Internet (Lazar et al., 2007). 

DLs are defined as digital content created by libraries and cultural heritage institutions excluding 
the digital content purchased from publishers. As multimedia DLs proliferate, more difficult situations for 
blind users occur, and help mechanisms play an essential role in assisting them in effective information 
searching. Help mechanisms are defined as overall help systems that facilitate users to use an IR system 
(Xie & Cool, 2009). In this study, the help-seeking situation is characterized by a person needing help in 
the context of an information search including browsing within a DL in order to achieve his/her tasks/goals. 

This poster focuses on blind users’ help-seeking situations at the cognitive level. Through literature 
review, we identified multiple cognitive constraints of the blind in information use on the Internet: 1) 
avoidance of pages containing severe accessibility problems, such as dynamic content (Craven, 2003; Bigham 
et. al, 2007); 2) problems understanding page/site structure when browsing as well as difficulties with the 
serialized-monolithic presentation of SRs (Salampasis et al., 2005); 3) sequential nature of interaction, 
meaning at any given point a blind user perceives only a snippet of the content, and loses all contextual 
information (Lazar et al., 2007); 4) mere translation of text content with a synthetic speech, and not a 
complete narration of information presented (Babu, 2011). Important cues embedded in color, images and 
videos that aid in navigation and interpretation are lost (Leuthold et al., 2008); 5) cognitive overload from 
spending cognitive resources in trying to understand the browser, the web site, and the SR simultaneously 
as well as being forced to hear repeated information across pages (Chandrashekar, 2010; Theofanos and 
Redish, 2003); 6) improper labeling causing significant confusion, frustration, and disorientation, 
particularly for interface objects such as buttons and input fields (Lazar et al., 2007). 
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However, previous literature provides neither in-depth discussion of how and why help-seeking 
situations arise for the blind in IR interaction, nor insight into their unique cognitions, perceptions and 
actions. A closer examination of their cognition and behavior in DL interactions is demanded. This study 
investigates the research question: What types of help-seeking situations do blind users face at the cognitive 
level in using a DL? 


2 Methodology 

This study was designed to explore blind users' help-seeking situations in gathering information from a DL. 
Fifteen blind adults from the Greater Milwaukee area were recruited through regional blind associations 
with $100 per person as compensation for his/her time and transportation expense. Qualifications for 
participation included reliance on screen readers to interact with computers and at least three years of 
experience in Internet use. Each experiment session was conducted at the usability testing lab in a state 
university. Each session comprised a pre-interview, a think-aloud observation, and a post-interview lasting 
a total of three hours. These participants represent blind users of different ages, genders and search skills. 
Detailed demographic data are omitted because of space limitation. 

A laptop with Internet Explorer 10, JAWS 12 and Morae 3.1 was used for this study. JAWS is the 
most popular SR in the blind community, and Morae software captures participant verbalization, screen 
video, and transaction logs. The American Memory Digital Collections was selected for this study because 
of its popularity and diverse help features. The pre-search interview included questions seeking perceptions 
about help mechanisms and help-seeking behavior in Internet use. The participants were instructed to first 
perform a 10-minute familiarization task to explore the DL and its functionality, and then to conduct three 
30-minute search tasks while thinking aloud: 1) known-item search (Find the Letter written by Alexander 
Graham Bell to Helen Keller dated March 23, 1907); 2) specific information search (Find when and how 
Presidents Lincoln and Garfield were assassinated); and 3) exploratory (Identify some U.S. immigration 
policy issues using multiple sources). The post-search interview solicited feedback on interaction experiences 
with the DL and its help features, as well as participants’ overall assessment and help-seeking situations 
faced during the searches. Interviews were audio-recorded and transcribed in their entirety, including 
participant verbalizations, SR announcements, and investigator observations. 

An open coding method was used for analyzing the transcripts. Five independent coders 
participated, and any disagreement was resolved by group discussions. Types of help-seeking situations were 
identified by analyzing both transcripts and transaction logs. The identified help-seeking situations were 
classified into three categories: shared by sighted users, unique to blind users at the physical level, and 
unique to blind users at the cognitive level. Due to space limitation, only preliminary findings on main blind 
users’ help-seeking situations at the cognitive level are reported in this poster. Table 1 presents the data 
collection and data analysis plan. 


Research question Data collection Data analysis 
Pre-questionnaire, pre-interview, think 
Types of help-seeking q p ; Open coding, taxonomy of help- 
. . aloud protocols, log analysis and post- . i . 
situations i seeking situations 
interviews 


Table 1: Data collection and data analysis plan 
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3. Results 


This section reports our preliminary results on help-seeking situations unique to blind users at the cognitive 
level in DL searching. We illustrate these results with evidence captured in participant utterances, SR 
announcements (enclosed within < >), and investigator observation (enclosed within {}). Due to space 
limitation, we discuss only three help-seeking situations--cognitive overload, comprehension and reasoning. 

Cognitive overload refers to the amount of information and interactions that must be processed 
simultaneously. In this study, it specifically refers to the difficulty in processing a large volume of 
information needed for a DL search at the same time. We observed this kind of situation when participants 
tried to interpret the information conveyed by the DL site, the browser, and the screen reader 
simultaneously. They were unable to clearly distinguish the three programs from each other, thereby failing 
to determine the appropriate course of action. The following illustrates the disoriented state of a participant 
who thought she was on the American Memory site but was actually trapped in the browser’s address bar. 


Why didn’t that . . . The Jaws search didn’t provide anything for Lincoln. I wasn’t expecting that. 


<Compatibility checkbox not checked. Title list AMLC dash Browse by Category. Windows 
Internet Explorer. Escape.> 


{Sigh}. Well, we’re in the right spot. 

< Escape. Compatibility check box not checked.> 

I seem to have gotten out of browse mode somehow. 

<Tool bar refresh left parenth f5. Toolbar. Compatibility check box not checked. Escape.> 


I’m in some kind of a menu system I don’t like. ’m getting out of there by hitting escape. Using 
the h button I was expecting to go back to the headers. 


And it’s not. 

<Escape. PC Curser> 

How did I get out of my browse mode? 
..1’m stuck in a tool bar. 


Comprehension refers to the ability to understand the purpose of a DL function from its label and 
arrangement. We observed that participants had difficulty understanding DL functions that were either 
unfamiliar or did not accompany a description. For instance, they could not understand the utility of 
browsing category items, hyperlinks, and search result organization criteria due to improper labeling. The 
following illustrates the frustration of a participant for failing to understand the utility of an unlabeled 
decorative graphic. 


<Link Abraham Lincoln. The Stern Collection. Blank. Link heading level 3 today in history. 
Heading level 3 July 18. Link graphic images slash underline icon.> 


Those things, I hate those. They don’t make any sense. It tells me it’s a graphic, but it doesn’t tell 
me what it is. 


Reasoning refers to making sense of different structures within DLs based on logical thinking. Participants 
faced a help-seeking situation when they could not make sense of interface structures, browsing categories, 
and organization of search results in DLs. They could not logically understand structures as they could 
perceive only a small fraction of the content at a time provided by SR. The following illustrates such a 
situation, where a participant was navigating down a long list of browse categories, but could not understand 
them. 
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I very seldom use categories like this, because they’re too slow. You have to read too much, and 
you don’t know what their categories are, so then you have to go through a whole bunch of 
extraneous stuff ...It sounds to me like instead of broadening the collection, you could probably go 
up to that link where it says, “browse the entire collection” and maybe they’d give you basic 
categories instead of subcategories. That might be... I don’t know. 


4 Discussion and Conclusion 


This study has both theoretical and practical implications. Cognitive overload situations occurred when 
participants had to simultaneously process information from the DL, the browser and the SR. The confusion 
prevented from adopting the right course of action. This situation could be overcome by 1) announcement 
of keyboard focus location, 2) shortcut keys to exit out of the current cursor location, and 3) virtual 
integration of all into one. Comprehension situations were caused by improper labeling, which could be 
resolved if content and controls of a DL function include meaningful labels supplemented with descriptive 
instruction. Reasoning situations happened when participants could not make sense of structures of an 
interface, browsing categories, a page, a category or search results. One solution could be presenting a 
descriptive summary that explains the current location with respect to the overall structure. Another 
solution would be a virtual tactile surface that affords the feeling of a 3D model structure (Jeong, 2008). 

This experiment is a pilot study for a large scale project, and its goal is to identify a comprehensive 
list of help-seeking situations at both the physical and cognitive levels with corresponding help mechanisms 
comprising a variety of explicit and implicit help features. We will experimentally validate the utility and 
usability of these help features using an application programming interface after incorporating them into 
existing DLs. 
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Abstract 

The use of social networking sites (SNS) has had implications in traditional areas of communication such as 
identity and relationship construction. This study explores how identity is expressed on Facebook and Twitter, 
the top two most trafficked SNS (Brenner, 2013). Specifically, this study reports the finding of a survey of young 
adults who use these sites. Respondents were asked questions about what prompts them to choose how they 
express their identity on Facebook and how/if it differs from identity expression using Twitter. In addition this 
study examines how Facebook and Twitter can be understood through the sociological theory of identity 
negotiation. Implications for the connection between social digital identity and “catfishing” are provided and 
discussed. 

Keywords: identity, social networks, communication, representation, trust 

Citation: Kaskazi, A. (2014). Social Network Identity: Facebook, Twitter and Identity Negotiation Theory. In iConference 2014 Proceedings 
(p. 858-859). doi:10.9776/14276 

Copyright: Copyright is held by the author. 

Acknowledgements: Special thanks to the iSchool Inclusion Institute of Information Science and Dr. Lynette Kvasny (Pennsylvania State 


University) 


Contact: KaskaziaQumich.edu 


1 Introduction 


Social network sites (SNS) function as socio-technical systems that allow users to broaden their communities and 
create and maintain new connections and relationships. As web-based services, SNS allow individuals to construct 
a profile, articulate a list of other users whom they share a connection and view the connections of the others in the 
system (Boyd & Ellision, 2007). In Facebook’s 2012 earnings report, the company noted that of its 1 billion profiles, 
about 83 million were fake accounts (Facebook,INC., 2012). Many other SNS host a large number of fake or duplicate 
account profiles, some purposely used in the new online trend of “catfishing”. Catfishing occurs when an individual 
creates a profile for a pseudo-identity on a SNS and uses it to create and maintain a romantic relationship. The 
term originated in a 2010 independent film that documented a deceptive online relationship. 

SNS enable users to negotiate an identity online. Identity negotiation theory is a sociological process in 
which people assign roles during the formation of a relationship. It is broken into two components. In the first phase 
people look for other who see them as they see themselves and approach interactions that are likely to uphold their 
self-view and self-esteem. This is known as self-verification. In the second phase people make predictions about how 
the other person will behave, and then act in ways that are likely to make the prediction true. This is called 
behavioral confirmation (Swann & Ely, 1984). 


2 Conclusion 


This study provides insights into the identity negotiation process that occurs on Facebook and Twitter. The self- 
verification phase take places when a user creates a profile and adds content. Self-view, or how a person thinks 
about his/her personality, can be perceived from the user’s profile. Users project their 'self-views' on to their profiles 
through post, photos and comments. Users then allow friends or followers to view that content, who will likely 
uphold that self-view. Facebook users have more control over who can view their profile than Twitter users. This 


may explain why many Facebook users are more mindful of their audience and why users are less likely to add an 
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unfamiliar person to their network. In order to protect their identity and preserve their self-view, users are less 
likely to add someone who might threaten their identity. 

Survey participant’s responses about Facebook usage reveal users did not believe their profile was a mirror 
representation of their identity. Many participants responded that they “didn’t want to offend any of their family 
members or co-workers” and wanted to ensure that information posted could not harm their appearance to others. 
In contrast a large majority of the survey participants felt as if Twitter was a better mirror reflection of their 
identity. Some explained that they “feel free to express themselves” and never thought about who could read their 
tweets so they were able to express more aspects of their personality. 

Identity negotiation becomes difficult if a person has encountered a catfish profile. Catfish profiles create 
problems in the identity negotiation process by using misleading information to gain the trust of another person. 
Based on the survey responses, we conclude that catfishing is more likely to take place or begin on Twitter, where 
people are more likely to make new connections and less likely to be cautious of an unfamiliar profile. Twitter allows 
users to use any name on their profile and in their handle, it even allows users to have multiple handles. While 
there are benefits to levels of anonymity and pseudo-identities, catfishing and its negative consequences create 
distrust and suspicion among users. Identity verification remains a critical component of trust building between 
users, particularly when engaging in romantic relationships. 

Though Facebook and Twitter are used in different ways, both sites allow users to create and portray an 
identity and interact with other users. Despite the differences in how users interact with SNS, both Twitter and 
Facebook are commonly used for building and maintaining social relationships. The primary difference between 
identity on Facebook and Twitter for my survey participants is a belief that Twitter allows them to portray a 
broader portion of their personality whereas Facebook is more of a representation of what you want other users to 
see. Survey respondents view Twitter as the raw, unrefined version of a user’s identity while a Facebook profiles is 
seen as a polished, edited version of a user’s online identity. Twitter users feel less pressure from the audience 
viewing their profiles. The identity negotiation process that takes place on SNS relies more on honesty and 
communication than traditional interactions because there aren’t other social cues such as first impressions or body 
language to help add to the information users are gathering on a person’s identity and behavior. 
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Abstract 

In this paper, we aim to make clear what problems might occur when public libraries in Japan introduce 
e-books. We conducted semi-structured interviews with nine public library directors. These interviews 
revealed a wide range of opinions about e-book use in public libraries. Seven main factors emerged: a) 
necessity of e-book introduction, b) recognition of e-books, c) budget for e-books, d) management of e- 
books, e) difference between public library users and e-book users, f) accessibility of e-books, and g) 
digitalization of library resources. We also analyzed the differences in perspectives on e-books between 
two kinds of public library management styles. 
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1 Introduction 


Public libraries play a leading role in society as information service institutions. In this digital age, public 
libraries are required to provide access to digital content. In Japan, 2010 was called “the first year of 
electronic books” (hereafter “e-books” ) because of the sale in Japan of devices such as iPads and Kindles. 
In this paper, we focus on e-book services in public libraries. 

In the United States of America (USA), 76% of public libraries have already introduced e-books 
[1]. In contrast, only 7.6% of public libraries in Japan have introduced e-books, based on a survey of 225 
public libraries in municipalities with populations mainly over 100,000 [2]. This statistic shows that e-book 
services in the USA have advanced further than those in Japan. In this research, we will investigate the 
factors that affect this difference. 

The introduction of e-books to public libraries has two aspects: the library side and the provider 
side. The provider side includes publishers, authors, and bookstores. Some provider side problems have 
gradually begun to be cleared up. These include author copyright [3], agreements on digitalization between 
authors and publishers [4], the unique distribution of publications in Japan [5], and lack of digital content 
[5]. 

As preliminary work, we focused on the public library side and identified the factors involved in 
introducing e-book services. 


2 Methodology 


We conducted interviews with nine public library directors. Their demographic information is shown in 
Table 1. Four of these libraries are managed by the local government (hereafter “LGM library”), while the 
other five have adopted the “Designated Administrator System” (hereafter “DAS library”). In a DAS 
library, the local government outsources library management to the private sector, including nonprofit 
organizations or stock companies. The local government can expect the enhancement of library services and 
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the cutting of government costs, making good use of the private sector’s effective management style, 
originality, and ingenuity [6]. In 2012, this system was adopted in 10.5% of Japan’s public libraries (332 of 
3154) [7]. Most directors of DAS libraries were not originally local government employees or public 
librarians. In this research, four of the nine directors came from different fields, as shown in Table 1. 

This research is a qualitative and exploratory study. We conducted semi-structured interviews with 
the nine directors from July to August 2013; the average length of each interview was approximately one 
hour. The interviews were recorded and important points were loosely transcribed. We addressed nine 
topics. 

In this paper, we particularly analyze their opinions about e-books, which were obtained from the 
following two questions. 


e Do you think that it is necessary to introduce e-books to the public library? 
e What were the merits and demerits of e-book use when your library introduced them? 


Director Management Style Career 

A Managed by local government Local government employee with library experience 

B Managed by local government Local government employee with library experience 

C Managed by local government Local government employee without library experience 
D Managed by local government Local government employee without library experience 
E Designed Administrator System Part-time library staff 

F Designed Administrator System Company employee without relevance to library 

G Designed Administrator System Bookstore staff 

H Designed Administrator System School teacher 

I Designed Administrator System University librarian 


Table 1: Demographic information on nine public library directors 


3 Results 


We only focus on problems with the library side in this paper. These interviews revealed a wide range of 
opinions about e-book use in public libraries, from which seven main factors emerged. 


3.1 Necessity of e-book introduction 


Eight of the nine directors (88.9%), excluding Director H, recognized that it will be necessary to introduce 
e-books in the future. Director H’s library has already introduced e-books. 


3.2 Recognition of e-books 

E-books are gradually being recognized. However, Directors D, E, G, and H believed that e-books have not 
gained widespread public acceptance. For example, Director E conducted a trial of e-books. The results 
indicated that library users did not fully understand the merits of e-books or how to use them. Director H 
introduced e-books three years ago. However, users’ e-books usage has not increased as significantly as the 
library estimated. They believe that this was affected by the lack of content and library users’ lack of 
receptivity for e-books. 


3.3 Budget for e-books 
The budget system differs between LGM and DAS libraries. However, both types of libraries’ directors 
believe that there is little possibility that they will be given enough of a budget for e-books. 

An LGM library’s budget is given to each expense item, such as labor costs, operating costs, or 
resource costs, by the local government. However, directors of LGM libraries recognize that it is difficult to 
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acquire an additional budget for e-books. Director D mentioned that they lacked the foundation to persuade 
the local government to acquire a budget for e-books. 

A DAS library’s budget is given in full to library management and then the director allocates a 
portion of the budget to each expense item. However, in general, most DAS libraries are given a minimum 
budget because the local government tries to decrease their budget as much as possible. 


3.4 Management of e-books 


Directors E and H had different perspectives about the management of e-books. Director E mentioned that 
paper resources are worn and damaged by graffiti and delinquency, and thought that these problems could 
be solved-and resources protected-if e-books were introduced. On the other hand, Director H was concerned 
that it may be difficult to maintain e-book devices because these devices are usually more expensive and 
delicate than other resources. Users are limited to the use of these devices only inside the library. We found 
that there were both the merit and demerits in terms of e-book management. 


3.5 Difference between public library users and e-book users 


Director C mentioned that there was a difference between public library users and e-book users. Public 
library users are mainly parents and children, seniors, and retired people. E-book users are mainly young 
people in their teens and twenties. Director C guessed that there would only be a minor effect, even if e- 
books were introduced immediately, but also guessed that the public library may be able to attract young 
people as potential users by introducing e-books. 


3.6 Accessibility of e-books 


Director A mentioned that e-books are more difficult to browse than paper books. Thus, as the use of e- 
books increases, it will be difficult for librarians to provide proper reading guides or reader advisory services. 
Director B mentioned that there is no guarantee that people will be able to access e-books permanently 
when the library cancels a subscription, and that companies that provide e-books will go bankrupt because 


e-books will not become public libraries’ property. 


3.7  Digitalization of library resources 


Two directors were more interested in the digitalization of their own libraries’ resources than commercial 
e-books. Director C stated that it is necessary to digitize the library’s local collections, which contain rare 
antique documents. Director B mentioned that the digitalization of their own libraries’ resources had higher 
priority than the introduction of commercial e-books. Director B argued for the importance of the 
digitalization of local administrative documents. Director B also mentioned that the local administrative 
documents are held in paper form in public libraries and that digitalized documents are made and held by 
each local government department. However, Director B pointed out that only the people who are familiar 
with the given government organization or institution are able to access the digitalized documents easily. 
It is important for public libraries to hold these documents in order to ensure equal access to documents 
for all people, regardless of whether they are familiar with the local government or not. 


4 Discussion 


The interviews revealed that all nine public library directors recognized the necessity of the introduction of 
e-books to the public library system. In addition, we found that perspectives differed between directors of 
LGM libraries and DAS libraries. 

Directors of DAS libraries were more likely to intend to actively introduce e-books than directors 
of LGM libraries were, despite the fact that e-book use in public libraries is not common yet. They are 
expected to provide attractive and impressive services. They must achieve satisfactory results in order to 
renew their next contract. On the other hand, we recognized that directors of LGM libraries were more 
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interested in the digitalization of their own collections, especially local administrative documents, local 
materials, and local collections. They are members of the local government and therefore aim to create more 
community-based libraries. 

Public libraries ought to introduce e-books in the near future. However, many of the directors we 
interviewed pointed out that there are also many problems outside of the library regarding e-books. Public 
libraries are required to work with other public libraries, publishing companies, national governments, and 
the Japanese national library (National Diet Library) in order to solve these problems. 

The results of this study reveal these issues from the viewpoint of public library directors. We plan 
to solve these problems by referring to the good examples of e-book introduction in the USA. 
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Abstract 

There has been a sizeable investment in the development of large-scale data and appropriate 
infrastructures in the physical and biological sciences and increasingly in the social sciences and 
humanities. Concerns about data sustainability have attracted a great deal of attention as research 
project data collection represents a significant investment, and loss of subsequent use of that data 
represents a loss of potential value. In this poster, we focus on of the most long-lived examples of data 
archives: Social Science Data Archives (SSDAs). In this study, we report on preliminary research on the 
historical, institutional, and operational dimensions over SSDAs over time. Drawing upon analyses of 
institutional and policy documents and interviews with staff, depositors, and administrators, this poster 
briefly discusses current challenges to SSDA longevity and implications for next steps in expanding the 
study both theoretically and methodologically. Initial themes discussed in this poster include data 
archives making a market for themselves, configuring their products and their user base and ongoing 


tensions between the need to generate revenue and pressure for open access data. 
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1 Data Archives and Sustainability 


This project examines strategies employed by data archives to remain sustainable over time. It focuses on 
the most long-lived examples of data archives: Social Science Data Archives (SSDAs). The social sciences 
have enjoyed stable data archives since the 1940s (Green and Gutmann, 2007). Their longevity provides an 
opportunity to examine archive sustainability through changes in funding, technology, and organizational 
infrastructure. 

We report on one historical case study: the Inter-university Consortium for Political and Social 
Research (ICPSR). We compare the historical themes to explore a much smaller and newer SSDA, the Irish 
Social Science Data Archives (ISSDA). 

We explore two research questions: 


e What are some key strategies employed by ICPSR to remain sustainable? 
e How do the historical themes from ICPSR compare with contemporary issues at ISSDA? 
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2 The Emergence of SSDA 


Approximately 16,000 PhDs entered the social sciences in the two decades following the Second World War 
(Lefcowitz and O’Shea, 1963). Emphasis on quantitative research in the social sciences — especially in 
political science, mass communication and economics, but also in history, sociology, and anthropology — 
grew as more public opinion surveys, economic development indicators and geocoded election data become 
available, as well as the computer hardware and software to generate and process them. Data archives such 
as the Roper Center began acquiring machine-readable data in the 1940s and first opened to the general 
public in the 1950s (Scheuch, 2003). In both Europe and the US, the 1960s saw the establishment of over 
a dozen data archives, consortia, and dedicated library services to coordinate data collection efforts across 
institutions, to promote sharing of valuable data and to educate students and scholars about quantitative 
and machine processing analysis methods (White, 1977) . New professional associations such as the 
International Association for Social Science Information Services and Technology (IASSIST) in the US, the 
Federation of European Social Data Archives in Europe (FESDA) and the International Federation of Data 
Organizations for Social Science (IFDO) emerged to help develop professionals to manage social science 
data sets and their processing technologies and coordinate international standards efforts and data exchange 
(Nasatir 1973). 


3 SSDA Cases and Methodology 


ICPSR, located at the University of Michigan in the United States, is a “consortium” composed of members 
who pay dues for access to data and have voting rights within the ICPSR governance structure. ISSDA, 
based in Dublin, Ireland is not a membership organization, does not have dues, and is primarily funded by 
the University College Dublin library budget (UC Dublin is a publicly funded university). Both curate data 
(as opposed to simply hosting raw data), offer value added services such as online analysis tools, and support 
educational services or materials. 

For our institution level analysis, we examined 40 years of ICPSR records, including thrice-annual 
governance meeting minutes, annual reports, strategic plans and grant proposals. We conducted semi- 
structured interviews with current and former staff and with researchers who have participated in 
governance. At ISSDA, we conducted interviews with institutional managers, depositors, and users of 
ISSDA. For our field level analysis, we examined documentation and conference proceedings from data 
professional organizations from the 1970s to the present including the IFDO, CESSDA, and IASSIST. 

Analysis of the data is inductive, iterative and informed by the investigators’ theoretical 
orientations. We first read over all historical documentation, making notes and tags in Mendeley. We then 
organized key themes over time. We summarized conference proceedings in a similar fashion. We compared 
institutional-level ICPSR themes with field perspectives provided by the professional society documents. 
Analysis is ongoing; this paper reports preliminary results. We also compared ICPSR themes with findings 
from the contemporary ISSDA case study. 


4 Initial Findings 
In this poster we present three preliminary themes from our historical analysis of the ICSPR data: making 


a market, product and user configurations, and open data and the commons. 


4.1 Making a Market 


ICPSR took various actions over time to shape social science research to align with the ICPSR mission. It 
created a need for itself through the popularization of quantitative computer-assisted analysis of ICPSR 
data sets. ICPSR’s “summer program” workshop series trained researchers and students and mobilized them 
as advocates for ICPSR. To use newly acquired skills, researchers needed access to large sets of machine- 
readable data, which were not readily available outside ICPSR. ICPSR fashioned itself as an essential 
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service of social science research by soliciting data from researchers and government agencies for its 
members, curating the data by cleaning it, de-identifying it and providing documentation, and then 
facilitating access and use with users at member universities through a system of “member representatives” 
or data library professionals. Member representatives would take requests from end users and receive the 
data from ICPSR, first through punch cards or tape drives, then later CD-ROMs. ICPSR cultivated 
relationships with member representatives, so that they would also advocate for ICPSR membership within 
their institutions. 

ICPSR also sought to coordinate collections with other data archives to avoid creating competition 
for data sources. For example in the late 1960’s ICPSR and Roper (also a membership data archive) 
coordinated which archive would collect and distribute which data. ICPSR also ensured its longevity 
through standards and professionalization of data practices, and positioning itself as a source of new 
knowledge about how to archive data. For example, in the mid 1990’s ICPSR convened the first committee 
to look into standards for social science data (i.e., DDI). The summer program incorporated classes on data 
curation, and ICPSR changed staff position descriptions to encourage more research activity on curation. 


4.2 Product and User Configuration: 


Throughout its history ICPSR has made significant changes around users, the nature of its products, and 
its pricing. 

In the early days, ICPSR sought membership and dues primarily from political science and sociology 
departments at large research universities in the US. As time went on, ICPSR re-positioned itself as 
something libraries should purchase because individual academic departments could no longer afford ICPSR 
fees. Over time with the large US research university market tapped, ICPSR sought to increase international 
and smaller college membership. Along the way, ICPSR also allowed (non-voting) membership by 
government agencies and commercial organizations. 

ICPSR has sought funding from many different governmental and foundation sources over time. 
For example, their early training funding from the National Science Foundation restricted them to 
disciplines NSF considered “scientific” (i.e., few historians), so ICPSR instead appealed to IBM for historian 
training funding. ICPSR has received significant funding from the US Department of Justice (over 14 
million), National Institutes of Health (over 8 million) and and Health and Human Services and the National 
Science Foundation (over 5 and 4 million respectively). Foundation sources like the Robert Wood Johnson 
Foundation and Mellon have provided more modest inputs. 

ICPSR originally shipped punch cards, tape drives and then CD-ROMs of requested data sets to 
organizational representatives (ORs) at members’ universities. These ORs would then facilitate access for 
end users and store the data for local reuse. In the late 1990s, in a major change in distribution patterns, 
ICPSR provided online access directly to some end users via a service “ICPSR Direct” that relied on new 
networking infrastructures. The nature of the ICPSR product changed based on perceived data demands 
within the scholarly fields; for example, scholarly trends encouraged diversification from US voting and 
political science data to acquisition of more international and economics data. 

Most importantly, ICPSR began to offer data hosting services to government agencies in the 1970s. 
Grants and contracts to host “topical archives” for agencies became the largest source of revenue for ICPSR 
after 2002, such that by 2010 it was just under three times ICPSR’s membership income. 

ICPSR adjusted its pricing options over time to either attract new desired customers or deal with 
access problems created by technologies like computer networking. ICPSR’s primary pricing model included 
several different “classes” of universities reflecting differences in whether a university had graduate research- 
oriented programs, or simply undergraduate programs. A further challenge emerged in the 1970s with 
computer networking as member campuses began to share access to ICPSR data with nonmember 
institutions via regional networks. ICPSR responded by creating a new “federated” pricing model that 
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accommodated low volume access to new campuses via computer networking with member institutions. It 
also added an international membership category. In 1997 it offered membership to the OhioLink consortia 
in the US. ICPSR has always allowed access to particular data sets for one-time fees. 


4.3, Open Data and the Commons 


From its inception, ICPSR experienced a tension between the archive’s mission to encourage as much use 
as possible and the need to generate income in order to process and curate data and offer educational 
services. ICPSR data collections had typically only been available to members. But, the ICPSR Board has 
always considered and granted “free” access to data in instances where they perceived the scholar’s home 
institution might not be able to afford a subscription or purchase of data sets. Freeriding by economically 
viable institutions whom the Board perceived ought to be members was not permitted. 

Pressure for “free” access also came from other institutions that began to provide free public access 
to government data and from the government agencies which paid ICPSR to host their data. For example, 
University of Minnesota hosted the 2000 US Census data for free, drawing ICPSR’s members only access 
into question. By the late 1990s the Membership committee reported that some schools were dropping 
membership because the data faculty needed were available for free. 

Some government agencies who paid ICPSR to host data also required free public access to some 
data. In 1998, about 20% of ICPSR’s data was available for free from its website including the Substance 
Abuse and Mental Health archive hosted for the US Department of Health and Human Services, and the 
National Archive for Computerized Data on Aging hosted for the US National Institute on Aging. 


4.4 Insight into Current Issues: ISSDA 


One challenge that ICPSR has had to address over the course of its history is making the case for itself to 
funding agencies and researchers, something that is only happening in the twelfth year of ISSDA’s existence 
(during a time of extreme economic austerity). While ICPSR has not solved the problem of revenue, it has 
adapted by creating new products and customers. ISSDA’s current challenges stem from a combination of 
historical and contemporary factors: a complicated history of staffing and funding, uncertain ongoing 
funding, and lack of specific expertise in working with social science data. While it has not created new 
products, since research began on this article, ISSDA has taken other steps towards sustaining itself. These 
steps include affiliation with ICPSR and membership in the European Research Infrastructure Consortium 
(ERIC) to increase visibility and gain SSDA expertise. ISSDA is also increasing outreach to potential 
depositors (particularly large grant recipients) and implementing a streamlined process for data deposit and 
use (thus potentially increasing both). 


5 Conclusion and Next Steps 


Future research will include more cases to obtain a fuller picture of SSDAs over time. Furthermore, our 
initial studies suggest that analyzing SSDA as “knowledge commons” (Hess and Ostrom, 2007) will enrich 
our understanding of SSDAs as institutions and sets of practices. Subsequent work will include more cases 
of SSDAs that serve different fields with different changes in science data practices. Our study of the history 
of SSDAs in curating data and maintaining access will provide insight into the sustainability issues of data 
archives in other fields. 
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Abstract 

This study aims to increase the use of virtual reference service by increasing the awareness of the 
availability of the service to users who really need it. A new situationally-based virtual reference interface, 
called the sVR interface, has been designed to reflect different levels of user search success. Findings from 
an eight-month field study done in a university library improved our understanding of how to effectively 
enhance the availability of virtual reference service to users who need it. A discussion about balancing 


the availability and the intrusiveness of virtual reference service is also provided. 
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1 Introduction 


Virtual reference services in academic libraries are an important component of library services that have a 
positive perception among patrons. Research by Rawson, et al., found that 92% of users in their study 
indicated that virtual reference service was helpful and 93% indicated they would recommend it to a friend 
(2013, p. 96). However, the problem is that despite a positive perception, virtual reference service has also 
been found to have low usage (Radford & Kern, 2006). Although it has been shown that demand for virtual 
reference service is increasing (Nicol & Crook, 2013), total virtual reference service use remains low (Wagner, 
2013). 

Research has been conducted investigating the use of a pop-up to increase the saliency of virtual 
reference service (Mu, Dimitroff, Jordan, & Burclaff, 2011). Mu et al.’s study also raised the question that 
using a pop-up may increase the perceived level of intrusion felt by the users. Specifically, they found the 
pop-up intrusive when the user does not need virtual reference service or if the user would like to continue 
to try by themselves. 

This study aims to make virtual reference service more salient to users who need it. We also examine 
the balance between the intrusiveness and the availability of the virtual reference service. 


2 System Design 


We introduce a situational virtual reference service in an attempt to balance the intrusion of the offer of 
virtual reference service and the availability of the virtual reference service by delivering the offer to those 
who really need help. This study establishes four conditions in which the color, size, and location of the 
virtual reference service are altered based on the level of success of the user’s search in order to attract the 
attention of users who need help. 


e Condition 1: 1-200 results 

e Condition 2: zero results 

e Condition 3: more than 200 results 

e Condition 4: three consecutive occurrences of either Condition 2 and/or Condition 3 


Condition 1 is considered to be the normal condition. In this condition, the user retrieves 1-200 results. We 
consider this to be a successful search. Condition 2 occurs when a user receives zero results in response to 
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a search query and Condition 3 occurs when a user receives more than 200 results. In this study, we consider 
Condition 2 and Condition 3 as undesirable search results and define them as search failures (Bates, 1984). 
Condition 4 is triggered by any combination of Condition 2 and/or Condition 3 happening three times in a 
row. 

These four conditions are associated with Brajnik et al.’s theory of critical and enhanceable 
situations (2002). An isolated search failure, Condition 2 or Condition 3, is most closely linked to Brajnik 
et al.’s enhanceable situation in which a user may still try to improve the results by himself or herself. 
Consecutive failure, as in Condition 4, is treated more as a critical situation in which virtual reference 
service might be appreciated by the user. 


PantherCatalog ie 
Enter new search: ]| Submit search | 
Titles zero results with em 
Database: UW-Milwaukee i d ‘ ti 
Words Anywhere - informatttion mouse-over description 


Edit Search Save Search Save Search as Alert 
(Print (Export H Emai {AGU} Select Page Lian 
(Email) Select Page Dan 


| E-Journals List | Resources A-Z | Other Libraries | Interlibrary Loan | Ask a Librarian | 
Conditions of Us 


2. please send us feedback 


Figure 1: Condition 2: zero results. The virtual reference service button is displayed at the top of the page 


Red border and 
double-sized 


Need help? Ask a Librarian.) ——> | Click here. Ask a Librarian. 


Figure 2: Condition 4: Three consecutive unsuccessful searches. The virtual reference service appears in 
red and its size is doubled 


3 Methodology 


Four computer stations were used in this study. Three computers had the sVR design installed, and one 
machine had the control interface. The control and experimental interfaces were identical prior to entering 
a search query. The only difference in the machines was that the sVR machines displayed the virtual 
reference service link according to the search success, and the control machine displayed the same virtual 
reference service link regardless of the level of the search success. The link to the virtual reference service 
on the control interface was the same as link on the new sVR interface’s Condition 1: 1-200 results. The 
four machines were placed in a cluster on the first floor on the library in an area that was relatively far 
from the reference librarians and out of their line of sight. This location was chosen to increase the likelihood 
that users would use the virtual reference service for help rather than asking a librarian face-to-face. 

In general, when a user is searching for a resource, multiple searches may apply, resulting in a set 
of search actions that are considered a single search session. In this study, we determine session boundaries 
with a two-step process. In the first step, session boundaries are determined automatically based on three 
scenarios. Three scenarios were established in order to distinguish the session boundaries. Firstly, if less 
than one minute elapsed between actions, the system always treated that action as the same search session. 
In the second scenario, a new Session ID was generated whenever five minutes had elapsed without action 
from the user. This is in line with the findings of Spink and Jansen, which found in regards to search 


é 


sessions, “..a substantial percentage [of sessions] lasting less than 5 minutes” (2004, p. 121). The third 
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scenario occurred when one to five minutes had elapsed since the previous action. In this scenario, the 
system compared the two search queries. If at least 50% of the search terms in the two queries matched 
one another, the same Session ID was maintained. If not, a new Session ID was assigned. In the second step 
of the two-step session boundary detection process, the researchers manually checked and validated the 
results of the first step. 


4 Results and Analysis 


Data was collected eight months from Nov, 2012 to June, 2013. The data was processed and cleansed to 
remove meaningless data such as records without search terms or test data that had been input by the 
researchers. After the data was processed and cleansed, there were a total of 587 valid actions which 
comprised 191 sessions. Of the total number of search sessions, 117 sessions were performed on the 
experimental interface, while 74 search sessions took place on the control interface. This translates to 38.74% 
of the sessions having been performed on the control interface and 61.26% of the search sessions having 
been performed on the experimental interface. Actions were coded into three types of actions: entering a 
search query (370 instances); clicking an individual search result to display the full bibliographic record 
(214 instances); clicking the virtual reference service link to request the assistance of an online librarian (3 
instances). 

The number of actions performed within each search session ranged from one action to a maximum 
of 20 actions in a given session, with the average number of actions being 3.07 actions per session. The 
average amount of time spent per session using the system was 2 minutes and 4 seconds. We clustered the 
sessions into three groups. Group 1 sessions have 1.38 actions per sessions on average with an average of 
33.80 seconds time searching. In general, this group of users exhibits a “hit & run” behavior. Whether they 
are successful or fail at their search is unknown. But if they succeed, they can be said to be either expert 
users or lucky. If they fail, they are impatient. On average, Group 2 users perform 4.79 actions per session 
and spent 4 minutes and 6 seconds using the interface. This group spends more time and performs more 
actions than Group 1. This group comprised 35.6% of all the sessions (see Fig. 3). All three virtual reference 
clicks came from these users. The Group 3 users perform a relatively large number of actions per session. 
They average 16.5 actions per session and spent on average 8 minutes and 30 seconds using the interface. 
This group of users tended to continue to try to find the resources they desired by themselves without 
seeking help. 


871 


iConference 2014 Joel DesArmo et al. 


The number of occurrences 
grouped by the number of actions per session 


The number of occurrences 
ul ov NI CO 
oo0o°o 


40 
30 i l 
20 J i 
10 o O0 1 0 O 1 1 1 0 1 1 
0 i i ia zi bs H = 
1 2 3 4 5 6 7 8 9 10 11 12 13. 14 15 16 a7 48 “19 20 
The number of actions per session 
Figure 3: Number of actions per session 
The average session time 
£ grouped by the number of actions per session 
= 0:21:36 
S | 
a 
9 0:14:24 i - 
v | 
oo 1 1 
S 0:07:12 ' l i 
; anluliil | 
© i i 
2 0:00:00 += M Tii — I iit 
z 1 2'3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
The number of actions per session 


Figure 4: Average session times relative to the number of actions per session 


In this study, 38.74% of the search sessions occurred on the control interface and 61.26% of the search 
sessions occurred on the sVR interface. Virtual reference assistance was never requested when using the 
control interface. The experimental interface logged three unique requests for virtual reference service. 

In Case #1 the virtual reference service usage was a result of condition 2: zero results. Case #£2’s 
virtual reference service usage was a result of condition 3: a large number of results. Virtual reference service 
usage in Case #3 was also a result of condition 2: zero results. So while we are not able to provide a 
statistically significant conclusion to our research goal examining whether the new sVR interface will 
increase usage of virtual reference service, it is very interesting that all of the instances of virtual reference 
service use in this study occurred under conditions in which the user was experiencing an undesirable range 
of search results as defined within this study. The users who clicked virtual reference service link all came 
from Group 2 as described in Fig. 3. One possible explanation for why users from this group sought out 
virtual reference is that these users do not give up as easily as those in Group 1, yet do not seem be users 
who want to do their searches entirely by themselves as those in Group 3. 
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5 Conclusions 


This study has designed and implemented a new situational virtual reference service interface to address 
issues of the availability of virtual reference service. The new design attempts to enhance the offer of virtual 
reference service within a search session to users who need it, with a secondary goal of being non-intrusive. 
Our new system includes three conditions under which a user may experience frustration: no results, too 
many results, and repeated search failure. By examining scenarios in which users elected to use virtual 
reference service, we obtained findings that provided insight into factors contributing to virtual reference 
service usage. Each case illustrates a condition under which users may be experiencing search frustration 
and need help. Finally, our study findings indicated that all virtual reference service requests came from 
our improved interface. We believe this study sheds light on some important issues related to increasing 
the usage of virtual reference service. A future study will use follow-up interviews and eye-tracking tools to 
further understand the conditions under which to offer virtual reference service. 
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Figure 1: Condition 2: zero results. The virtual reference service button is displayed at the top of the page 
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Abstract 

This article presents a condensed evolution of the interdisciplinarity of Biochemistry & Molecular Biology 
(BMB) by showing its chronological sequence of cited disciplines from 1910 until 2012. Interdisciplinary 
research has become a general approach for solving complex problems in modern science. The results of 
our research might help policy makers and funding agencies understand more comprehensively the 
interdisciplinary nature of specialties and disciplines. Using science overlay maps based on the NSF 
classification systems, our analysis confirms that interdisciplinarity begins with neighbouring fields and 


evolves to more distant cognitive areas. 
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1 Introduction 


Interdisciplinary research has become a general approach for solving complex problems in modern science, 
and, as such, has been encouraged by science policy (Rafols & Meyer, 2007) both by creating 
multidisciplinary centers and by funding multidisciplinary research projects (Bordons, Zulueta, Romero, & 
Barrigón, 1999). Porter and Rafols (2009) mention that interdisciplinary research seems almost universally 
acclaimed as “the way to go.” Measuring and understanding interdisciplinary research is a topic that has 
received more attention in bibliometrics studies lately. 

In an attempt to answer the question “Is science becoming more interdisciplinary?” Porter and 
Rafols (2009) investigate interdisciplinarity evolution over a thirty year period across six research domains. 
The results attest to notable changes in research practices over that period of time, specifically in the 
number of cited disciplines and references per article, and co-authors per article. However, the Rao-Stirling 
Index only shows a modest increase. The authors hint that this is due to the fact that distribution of 
citations of an article remains mainly within neighbouring disciplinary areas. Similarly, Larivière and 
Gingras (2014) show that, after declining between 1945 and 1975, interdisciplinarity has been rising, mainly 
at the expense of the focus on specialties. Another recent study by Levitt, Thelwall, and Oppenheim (2011) 
analyzed the evolution of interdisciplinarity in the Social Science Citation Index (SSCI) categories for three 
specific years (1980, 1990 and 2000). The authors showed that the median level of interdisciplinarity of 
these fields had decreased between 1980 and 1990, but then climbed back in 2000 to its 1980 level. Mansilla, 
Feller, and Gardner (2006) analyzed changes in level of interdisciplinarity between 1985 and 1995 and found 
that very few disciplines displayed significant changes in levels of interdisciplinarity during that time. Chang 
and Huang (2012) used three bibliometric methods (direct citation, bibliographic coupling, and co- 
authorship analysis) to investigate interdisciplinary changes in Library and Information Science over the 
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past thirty years. This article presents a condensed evolution of the interdisciplinarity of Biochemistry & 
Molecular Biology (BMB) by showing the chronological sequence of cited disciplines. 


2 Data and Method 


2.1 Data Collection 


A total of 1,539,526 Biochemistry & Molecular Biology (BMB) papers (published between 1910 and 2012), 
along with 40,855,852 corresponding references (journal article only) were obtained from Thomson Reuters 
Web of Science which include over one century of both papers and references. The disciplinary classification 
of journals used in this paper is that of the U.S. National Science Foundation (NSF) which categorizes each 
journal into one discipline and one specialty. The main advantage of this classification over that of Thomson 
Reuters is that it classifies each journal into one discipline only, thus avoiding multiple counting of 
publications. This classification includes 143 specialties spanning fourteen disciplines. The paper mainly 
explores the evolution of interdisciplinarity by looking at the time at which different disciplines are first 
cited by BMB from 1910 to 2012. One limitation of this study is that this classification represents the 
current state of journal classification which might or might not be stable through time. That being said, 
given that very few disciplines have ceased to exist—while a lot of disciplines were created—using the 
current classification should not create a lot of anachronism. Further analysis with chronological sampling 
would be a useful addition. 


2.2 Science Overlay Map Based on the NSF Classification System 


Maps of science allow visualization of elements (usually scientific disciplines) and the relation-ships that 
exist between them (Klavans & Boyack, 2009). Science overlay map is an efficient tool to display disciplinary 
distribution and evolution. Since the local science maps are problematic for comparisons because they are 
not stable in the units or positions of representation, some authors recommend using science overlay maps 
to overcome this problem (Rafols, Porter, & Leydesdorff, 2010; Leydesdorff, Carley, & Rafols, 2013). Science 
overlay maps use the units and positions derived from a global map of science, but overlay on them the 
data corresponding to the organizations or themes under study. 

Rafols et al. (2010) generate a matrix of citing SCs (Web of Science Subject Categories) to cited 
SCs using the Journal Citation Report and construct a basemap of 221 SCs. Since we adopted the NSF 
Classification System for our study, we could not utilize this basemap and therefore had to produce our 
own basemap. We constructed a discipline-to-discipline co-citation matrix using ten years of data (2003- 
2012) from Web of Science. Salton’s cosine was used for normalization in the co-citation value. We used 
VOS Viewer to generate a global map of science, based on NSF classification system. Figure 1 shows the 
global map of science based on the 143 specialties of the NSF classification. Each node in the map shows 
one specialty, its size being determined by the total citations received throughout the en-tire period 
analyzed. The relative position of the specialty is determined by their similarity, based on the VOS MDS 
algorithm (van Eck & Waltman, 2010; van Eck, Waltman, Dekker, & van den Berg, 2010). The colours in 
figure 1 correspond to the fourteen NSF disciplines. 
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Figure 1: Global map of science based on the NSF classification system 


In the upper part of figure 1 we find Biomedical Research, Clinical Medicine and Biology, whereas in the 
bottom left we find Social Sciences, Psychology, Professional Fields, Humanities and Art. The bottom right 
consists of Chemistry, Physics, Engineering and Technology, Earth and Space Sciences and Mathematics. 
Health is found between the upper and bottom left. The global science map from NSF classification is 
similar to the global science map reported in Rafols et al. (2010) and the consensus map of science produced 
by Klavans and Boyack (2009, p. 469). 


3 Results 


3.1 Interdisciplinary Evolution Trace of BMB 


Our analysis shows that the number of specialties cited in Biochemistry & Molecular Biology (BMB) grows 
from 1 to 93 (spanning 12 NSF disciplines) over the one hundred year period un-der study. This is typical 
of an evolution toward interdisciplinarity. It is interesting to present the chronological sequence of cited 
disciplines by BMB researchers. We chose to display the evolution of interdisciplinarity from the level of 
NSF discipline and then specially aimed at some benchmarks in the development of BMB as a discipline. 
Let us consider the core disciplines in the emerging sequence of interdisciplinary relations. Figure 2 
illustrates the chronological sequence of disciplines. The year of appearance corresponds to the time the 
discipline reached a certain significance in terms of citations received (250 citations was deemed significant). 

BMB is a specialty of Biomedical Research, so at first BMB only cited publications from the 
discipline itself in 1910, and then started to cite publications from other disciplines such as Chemistry 
(1924), Clinical Medicine (1937) and Biology (1949). These three NSF disciplines are close to BMB on the 
global science map based on the NSF classification system (refer to figure 1). With the development of 
BMB, some publications from more distant disciplines such as Physics emerged in the referenced disciplines 
in 1961. Psychology, Earth and Space Sciences, Engineering and Technology and Mathematics also appeared 
one after the other in the referenced disciplines between 1982 and 1993. It is worth noting that two 
“professional fields” (Library and Information Science, Management) and Social Sciences, which may be 
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considered as distant disciplines, emerge in the referenced disciplines respectively in 2003 and 2012 probably 
reflecting the recent debates on the uses of bibliometrics to evaluate research in BMB. 


Biomedical Research (Biochemistry & Molecular Biology) 
Chemistry (General Chemistry) 


Clinical Medicine (General & Internal Medicine) 
Biology (Botany) 


Physics (Chemical Physics) 
Psychology (Behavioral Science & Complementary Psychology) 
Earth and Space (Environmental Science) 


Engineering and Technology (Computers) 


1993 i 
Mathematics (Probability & Statistics) 


2003 


Professional Fields (Information Science & Library Science) 
Health 


Social Sciences (Economics) 
Professional Fields (Management) 


Figure 2: Discipline emerging sequence for Biochemistry & Molecular Biology 


4 Discussion and Conclusion 


We examined the evolution of interdisciplinarity in Biochemistry & Molecular Biology (BMB) over a 
century using the disciplines of papers referenced in BMB journals. This study confirms that 
interdisciplinarity begins with neighbouring fields and evolves to more distant cognitive areas. These results 
might help policy makers and funding agencies understand more comprehensively the interdisciplinary 
nature of specialties and disciplines. 
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This poster presents research-in-progress which seeks to evaluate a proposed theoretical framework in the 
context of unequal access to research results and data with national security implications among 
collaborators. The study surveys faculty and students in US university laboratories which receive federal 
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1 Introduction 


Digital open access to publically funded research data and papers presents an interesting area of confluence 
between library and information science (LIS) and government accountability (Peled, 2011; Sprehe, 1999). 
Research has shown that open information policies have unequal compliance levels between United States 
government bureaucratic agencies, with instances of low compliance often attributed to national security 
interests, such as in the Department of Defense (Oltmann, Rosenbaum, & Hara, 2006). Barriers to access 
regarding national security may have benefits, but there is considerable evidence that access to information 
classified in this way often benefits research in this domain (Kramer, 2010; Rao & Singh, 2007). 

Despite the clear documentation that research with security implications may only be disseminated 
on a limited basis and that government classifications govern the work of researchers who receive federal 
funding, it is not entirely clear what the impact of barriers to access has on research practices or how often 
researchers encounter these barriers. 


2 Theoretical Framework 


Policy and technology interact in complex ways to impact access to publically funded research (Belanger, 
2009), and the environment is most complex when information under government control has security 
implications (e.g. Carr, Henchal, Wilhelmsen, & Carr, 2004). Previous scholarship has evaluated digital 
access and digital open access from many scholarly perspectives, focusing on policy (e.g. Câmara & Fonseca, 
2007; Feltren, 2012), technology (e.g. Bélanger, 2009; Kim, 2010), and implementation (e.g. Marcial & 
Hemminger, 2010; Miguel, Chinchilla-Rodriguez, & de Moya-Anegón, 2011) under various criteria, which 
can be categorized as: success and failure (e.g. Joseph, 2008; Marcial & Hemminger, 2010), infrastructure 
(e.g. Joseph, 2008; Kumpulainen & Järvelin, 2012), scope (e.g. Davis, 2008; Gostojić, Sladić, Milosavljević, 
& Konjović, 2012; Sprehe, 1999), and accessibility (e.g. Davis, 2008; Joseph, 2008; Kumpulainen & Järvelin, 
2012). A framework has been developed through synthesis of the literature that characterizes each parameter 
within these categories and identifies the breadth and depth of scholarship in each area (Under Review). 
The utility of this framework is in its integration of multidisciplinary perspectives, as many past 
studies have neglected the dynamic nature of the relationship between policy and technology, with respect 
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to their impacts on access. This work in progress will also seek to provide an integrated socio-technical 
perspective. 

In applying this framework, this study shall seek to answer the following questions: how do policy 
and technology control access to research data resultant from Department of Defense, National Security 
Agency, and Department of State funded projects; How prevalent are barriers to access; how do socio- 
technical controls to access create unequal access within collaborative research; and are there patterns of 


unequal access regarding collaborative hierarchies? 


3 Methodology 


In order to answer these questions, an online survey will be distributed to all principal investigators, as well 
as independently funded students and post-doctoral scholars, listed as currently operating on federal grants 
from the Department of Defense, National Security Agency, and Department of State. A preliminary letter 
of information regarding the nature of the study and its goals will comprise the body of the email. The 
online survey will be prefaced by the statement of informed consent, asserting the anonymity of all 
participants and their responses, which must be completed prior to viewing the survey. 

Questions will be divided into three sections: the first will assess controls exerted on their own 
research, the second will assess encounters with barriers to access in their information seeking processes and 
those within their laboratories, and the third will request information about the respondent’s experience as 
a scientist, so as to characterize who is most impacted by barriers to access. Sections one and two will 
include a variety of likert scale, open-ended, and multiple choice, in which respondents can choose all that 
apply, questions. 

In the first section, questions regarding grant restrictions on storage and dissemination of results 
will determine how policy controls access to information from the point of view of the researcher. This self- 
reported information will be compared to actual policy documents pertaining to data and publication. 
Questions regarding the normative practices for storage and dissemination will elucidate the compliance 
with policy barriers to access. Data gathered from these questions will address policy barriers, as well as 
how policies lead researchers to use technological constraints to control access. 

In the second section, questions regarding the experiences of respondents and members of their 
laboratory in seeking information will assess how access is constructed, socially and technologically. 
Questions will specifically address: ease of access, problems experienced, limits to access, and strategies 
employed to access restricted data or publications. From results gathered through this line of inquiry, it 
will be possible to assess: the extent to which policy barriers successfully constrain access; social and 
technical barriers to information sharing and access; and relative levels of inaccessible information sought. 

Following these categorical questions, key to the research questions asked, will be a section requiring 
identification of more personal characteristics, and is placed as such to follow conventions regarding 
demographic information gathering (Dillman, 1978). Participants will be asked to: identify their role in 
research (principal investigators; post-doctoral researchers; doctoral students; master’s students; 
undergraduate students), the duration for which they have worked on their current project, and the duration 
for which they have worked on federally funded research. These questions will help to characterize inequality 
of access within collaborative research. Furthermore, respondents will be asked to indicate whether they 
would be willing to participate in a follow-up interview. 

In addition to statistical assessment of correlations between accessibility issues, controls, and 
researcher attributes, in terms of hierarchy and experience, and qualitative assessment of open-ended 
descriptions of access problems and subsequent interviews, contextual policy analysis of funding agencies’ 
guidelines and restrictions will be employed to evaluate these research questions. 
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4 Conclusion 


This research seeks to assess levels of access and identify where access ought to be increased, either through 
improved compliance with open access policy or identification of secure access mechanisms for collaborative 
research groups. Improved facilitation of research with security implications is important to efficiently and 
effectively design projects in this domain. 
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1 Introduction 


Distance education has achieved tremendous success in many iSchools in America. According to the 
directory from American Library Association, currently 22 Library and Information Science (LIS) master’s 
programs are offered completely online and 12 programs allow students to take the majority of classes online 
(American Library Association, 2013). In order to sustain the growth of distant programs on information 
sciences, it is crucial to explore innovative pedagogy that will attract and retain students (Aversa and 
MacCall, 2013). 

MOOC, or Massive Open Online Course, is an emerging method of online delivery that has gained 
much attention from the educational communities. A MOOC is “a model for delivering learning content 
online to any person who wants to take a course, with no limit on attendance” (EUDCAUSE, 2013). 
MOOC’s characteristics include massive enrollment with free online access, video-supported lectures with 
embedded online quizzes, and peer and self-assessment to support learning (Glance, Forsey, and Riley, 
2013). 

MOOC provides free online courses from institutions that offer face-to-face courses, and some 
institutions are considering adopting MOOC as a course management system and offering for-credit-courses 
to support independent learning (Moore, 2013). Assuming the same level of knowledge is offered, free online 
courses seem more favorable to potential students. While it is too early to predict whether free courses will 
endanger the existence of LIS programs, it is valuable to find the pedagogical innovations from MOOC, 
contribute to the development of such technologies, and apply them to iSchool education in term of course 
redesign. 

This study addresses the following research questions: 


1. What are innovative pedagogical approaches in Massive Open Online Course, particularly courses 
offered through Coursera, Udacity, and edX? 
2. What pedagogical approaches can be beneficial to the redesign of iSchool online courses? 


This poster session intends to analyze the literature related to pedagogical approaches in MOOC and help 
the redesign of current LIS online courses. Because this study is a comprehensive review in nature, the 
results are preliminary and need to be studied further using experimental research. 
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2 Methodology 


A narrative analysis of recent publication related to MOOC was conducted. The databases were Academic 
Search Complete, ERIC, and Library & Information Science Source from EBSCOhost. In addition, websites 
of three major MOOC providers (Coursera, 2013; Udacity, 2013; and edX, 2013) were selected to study the 
pedagogy. Additional sources from EDUCAUSE conference (www.educause.edu) were also considered. 


3 Data Analysis 


As of July 15, 2013, there were 344 total publications from all three databases, among them 279 are from 
Academic Search Complete (33 journal articles in this category), 37 publications from ERIC database (17 
journal articles), and 27 from Library and Information Science Source (3 magazine articles). It is note- 
worthy that three publications in library and information science are all informational in nature with less 
than three pages. After a preliminary analysis of current courses offered from different MOOC platforms, a 
comparison of pedagogies from three MOOC platforms is listed in Table 1. Blackboard CourseSites, a 
MOOC platform, shares the same design with Blackboard 9 and was not included in this study. 


Name Pedagogy Activities 
Coursera Interactive video short class videos with embedded quizzes 
Mastery learning students can re-study and re-attempt homework 
Peer assessments students grading peer papers and projects 
Active learning Interactive engagement between faculty, students, 
and between students and their peers 
Udacity Interactive video YouTube videos with embedded quizzes 
Mastery learning Students can re-study the videos and re-attempt 
the quizzes 
Online activities Students can answer questions for each other Live 
Q/A sessions from the professor 
edX Interactive video Live class videos followed by quizzes and 
discussions 
Mastery learning Students can re-study and re-attempt the 
homework 
Online discussion Students can answer questions from each other 


Table 1: Pedagogies of Major MOOC Models 


It is obvious the best MOOC courses are those with good-quality videos and interactive quizzes (see Figure 
1). Designers of those courses and their host institution might have invested heavily on the production of 
these high quality videos. 
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Almost all MOOC courses use discussion boards, but the interaction between students and professors varies 
(see Figure 2). Some courses tried to utilize students to answer questions, while in other courses professors 
host question-and-answer sessions regularly. If there is a huge enrollment in a class, the philosophy of 
“crowd-sourcing” may allow students to help each other, without the need of instructors moderating the 
classes intensively, or using the help of a teaching assistant. However, without data from real courses, this 
author does not know how much time a MOOC professor invests his or her time with the course each week. 
The copyright status of MOOC course materials is still a vague area. Copyright restrictions forbid 
instructors to use materials otherwise available to registered students in a regular accredidated higher 
institution. 

Among the three platforms, Udacity classes have open enrollment without deadlines while others 
have starting and ending dates (Round, 2013). 


e- 
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Figure 2: Discussion forums in an Udacity course 


4 Open Source MOOC Movement 


Although MOOC courses are free, the course management systems themselves are largely not. A more 
recent development in MOOC is the platform is gradually moving to open source territory, for example, 
EdX is moving to open-source direction and has published part of its code for freely download (Inside Higher 
Ed, 2013). ELMS initiative, or e-Learning Management System, is a Drupal-based open source project based 
created by Pennsylvania State University (https: //drupal.org/project/elms). The project released its latest 
version for Drupal 7 as of September 16, 2013. The software provides a platform to build either close- 
enrolled courses that are integrated with campus management systems, or completely open as open 
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education resources. The next step of this current paper is to install Drupal-based ELMS and experiment 
the use of such learning management system to graduate students. The usability and functionalities of open 
source MOOC environments will be reported. 


5 Conclusion 


MOOC adopts some innovative teaching methods and seems attractive to certain non-traditional distant 
learners. The crowd-source type of learning model potentially can allow instructors to design courses that 
can be contributed by everyone who has domain knowledge in a certain subject, without the restriction of 
a certain faculty member and/or teaching assistants. Inside the classroom, the interactive videos allow 
students to learn on their own pace and can be watched repeatedly for the sake of mastery learning, which 
is less cost-effective than synchronous online courses taught by faculty. The grading rubrics reduced the 
workload and help instructors use computers to grade assignments automatically. 

For students whose learning goal is to acquire knowledge from experts, rather than to obtain a 
diploma, MOOC seems to be a promising venue to meet their goal. MOOC can be a public relation tools 
for prestigious universities to attract potential students, audience, and funding support. If integrated with 
academic services such as admission, advising, quality control, library services, and technical support, 
MOOC could be a substantial competitor in the current higher education enterprise. Cusumano (2013) 
cautioned that while elite institutions offer free online courses which are heavily subsidized by tuition from 
on-campus students, private endowments, and government support, those for-profit, second and third-tier 
universities and colleges may be in danger to close. 

At this point, there is no indication MOOC will threaten the existence of online programs currently 
taught in iSchools. The for-credit online courses taught by regular faculty in these schools are rigorous in 
design and management in order to meet their parent institutions’ acredidation or educational standards. 
However, facing the challenge of economic downtown, student recruitment, enrollment, and increased cost 
of course management and support, online programs may lose students to MOOC and become obsolete in 
future. iSchool faculty should pay attention to this emergent competitor and redesign their courses to face 
against such challenges, or at least they can borrow the ideas of crowd-sourcing, interactive videos, mastery 
learning, peer-assessment, and live online activities in course redesign and improve the peer-interaction in 


their classes, and retain attract more students in their fields. 
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Abstract 

In Korea, the rapid development of information and communication technologies and the aging of the 
society have caused a wide information gap between generations and the effective exclusion of the elderly. 
As such, this research attempts to discover ways in which the modern elderly in Korea’s metropolitan 
areas seek information, as well as factors that influence their attitudes. The results of this field study 
show that the elderly people value interpersonal relationships most when seeking information. They 
actively seek information from human information sources, which in turn triggers further information 
seeking. Seniors do use digital devices, but there is a high barrier that prevents them from actively using 
digital devices to locate information. These findings would bring meaningful insight with which to 


investigate “new seniors,” the generation of seniors who possesses digital literacy, in the future. 
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1 Introduction 


Korea has been experiencing unprecedented speed in population aging, which has created various social 
concerns. Consequently, social interest in seniors has increased. At the same time, the information gap 
between generations has widened amidst the rapid development of information and technologies (Dewan & 
Riggins, 2005). 

The majority of Korea’s elderly have typical characteristics. They have devoted their lives to 
supporting their children, and as a consequence, they have not been adequately prepared for their senior 
years. In addition, these seniors have relatively low levels of education and have not had access to digital 
devices until reaching their senior years, so their learning capabilities for digital devices are low. This low 
digital literacy gives rise to an information gap, contributing to socioeconomic inequality and obstruction 
of social integration (Norris, 2003). 

To close this information gap, it is critical to understand the ways that senior citizens seek out and 
obtain information. The elderly often prefer face-to-face communication, which also improves their self- 
esteem (Vela-McConnell, 1999). We observed and interviewed some Korean seniors in order to understand 
their information-seeking patterns. Based on the findings, we are proposing an information-seeking behavior 
model for Korean seniors. 


2 Research Setting 


The subjects of this research are the seniors in the Budnae Senior Center located in Suwon, a metropolitan 
area in Korea. Most participants had few restrictions on their physical activities and communication 
capability. The center had various information media sources, such as newspapers, TVs, and personal 
computers (PCs). The senior center collaborates with several voluntary organizations, including churches 
and schools. For example, the schools offer informational and educational services to the center and seniors 
from the center provide lectures about gardening. Vibrant information-seeking activities take place there. 
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The senior center is similar to YMCA community centers in the United States, which act as a social 
welfare institution helping seniors to socialize. However, unlike the YMCA’s paid membership and program- 
oriented activities, general Korean institutions for seniors have strong welfare characteristics with 
government financial support. The centers usually offer free memberships and provide gathering places for 


seniors with low incomes. 


3 Method 


We observed the seniors in their natural settings and conducted individual semi-structured interviews to 
identify the factors influencing their information-seeking activities. 


3.1 Observations 


We observed seniors in the center for a total of 6 hours, 3 hours each day on two weekdays. There were 
approximately 200 seniors in the center when we were observing. Seniors were participating in daily 
activities such as playing Go and Pocket Ball. Observations were conducted using noninvasive methods to 
accurately understand the contexts. 

During the observations, we found that seniors frequently talked about health and hobbies. While 
discussing the information, we observed the seniors taking various roles as information sources and triggers 
for additional information needs. 


Figure 1: The Senior Observation (Playing Go and Pocket Ball) 


3.2 Interview 


Interviews were conducted based on the observation results to confirm seniors’ information-seeking behavior. 
We recruited 7 interviewees with the help of the center. We asked the seniors what kind of information 
they needed regarding health and hobbies. We also asked how they pursued their information needs and 
why they needed the information. In-depth one-on-one interviews were held with 3 women and 4 men in 


their seventies. Each interview was about 30 minutes long. 


4 Findings 
The study result showed that seniors mainly needed information about health and hobbies. Preferred sources 
for the information were people, TVs, and newspapers. This result reconfirmed previous research findings 
that showed that seniors need information about health, hobbies, and entertainment programs (Kim et al., 
2011). 

The interviews revealed that the seniors’ main source of information was their fellow seniors. These 
interactions with other seniors enhanced their interpersonal relationships and self-satisfaction. The seniors 
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also showed strong information needs with regard to their particular individual interests. However, they 
tended to avoid unfamiliar topics, because of high entry barriers to advanced information devices. 


4.1 Lonely but Active to Seek Information 


The Budnae Senior Center served about 25,000 senior members in their 60s, 70s, and 80s.! The observation 
subjects were physically and mentally healthy, but most had no job and had not prepared for their senior 
years. Today’s seniors (aged 65 and older) often have family-centered lifestyles and low levels of education. 
Many of them live separately from their children, without a spouse. Loneliness is a common problem for 
these seniors. 


“It’s very painful for someone to sit idle and do nothing. I can’t just die either, right? That’s why 
I come here every morning and get along with people. Loneliness is the most fearful thing for 
someone living alone. Like me.” 


Furthermore, the seniors’ use of information is insignificant compared to that of younger generations. 
Although Internet use, generally, has been found to strengthen people’s existing interpersonal relationships 
and maintain friendship networks (Valentine & Holloway, 2002), most seniors are not aware of the benefits 
of social networking services. However, the desire for acquiring and learning customized information is 
actually as high as that of the younger generation. We also discovered that the seniors tended to prioritize 
happy lives and pursue a variety of information, improving social relationships. 


4.2 People as Primary Information Resources 


The most noticeable information-gathering characteristic of the seniors was that people, particularly their 
peers, were the key sources, as well as triggers, that created information needs. 

One reason for this might be the role of Korean culture in interpersonal relationships. The seniors 
share information through interpersonal interactions, which also reconfirms the meaning of their existence 
and raises their self-esteem (Sung, 1991). This is significantly important for the seniors who show relatively 
low levels of social participation, compared to people of younger generations (Armbruster & Armstrong, 
1993; Sagiv & Schwartz, 2000). 

However, seniors have difficulty acquiring information through digital devices. From the interview, 
we learned that this is due to the high costs of learning and the psychological burdens from their low 
learning efficiency. According to a related survey, seniors avoid digital devices due to the difficulty of use; 
additionally, they also do not feel the need for such devices(Van Dijk & Hacker, 2003). 


“Now I’m an old man. Old people don’t like complicated things. Even if smart phones were free, I 
don’t want the complications. They’re a headache. I ended up avoiding them.” 


4.3 Comparison with the Younger Generation’s Information Seeking 


The Korean seniors preferred to communicate with people who share their experience. This is because they 
tend to customize practical information based on shared contexts. Particularly in Korea, customized 
information is very useful because there were large generational gaps, such as previous educational 
background (Greene, Caracelli, & Graham, 1989; Sung, 1991). 


1 http://www.budnae.or.kr/website/index.php 
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Figure 2: Hierarchical Layered Structures for Figure 3: Hierarchical Layered Structures for 
Information Seeking Behavior for the Seniors Information Seeking Behavior for Younger Generations 
(Granovetter, 1973) (Granovetter, 1973) 


As a result, the seniors’ range of information sources was limited (low reach). However, they tended to have 
a strong relationship with their information sources (high affinity) (Turner, 2004). This shows hierarchal 
layered structures with regard to information-seeking behavior (Sung, 1991). Figure 2 shows that seniors 
often found human information resources among those with whom they had close relationships. 
Furthermore, from the interviews, we found information-seeking behaviors usually took place below the 
third layer, which corresponded to less chance of using unfamiliar information sources. 

On the other hand, most numbers of younger generations showed weak interpersonal relationships, 
with large-scale complex networks (high reach, low affinity) (Turner, 2004). This expansion is made possible 
by various network-based communication tools (Dewan & Riggins, 2005). Figure 3 expresses the broad 
range of human information resources of the younger generations. However, most of the diverse relationships 
are based on weak interpersonal ties (Sung, 1991). Figures 2 and 3 imply that the number of close 
relationships is similar between the older generation and the younger generations (Greene et al., 1989; Kim 
et al., 2011). 


5 An Information-Seeking Model of Korean Seniors 


Drawing on Dervin & Nilan (1986) and Wilson (2000), an information-seeking behavior model of the Korean 
seniors is suggested based on the findings above (Figure 4). 

Information needs arise from various environmental triggers. Then, entry barriers are evaluated 
based on the individual’s contexts, followed by the decision to seek information or not. In the information- 
seeking process, appropriate information resources are searched first and some of them are selected for 
gathering information. After collecting information, seniors use the information when they need it. If it fails 
or proves inadequate or unsatisfactory during the process, jumping to any stage is always possible. 

In the case of seniors, their information sources are usually fellow seniors with a shared context. Once 
information is collected, it is either discarded or used once and memorized. The memory can be used as a 
resource or trigger, leading to another information-seeking behavior. 
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Figure 4: Information-Seeking Behavior Model of Senior in Korea 


6 Conclusion 


The goal of this poster was to understand Korean seniors’ information-seeking behavior, especially in Korea’s 
metropolitan area located in Suwon. This study has limitations for explaining generally Korean seniors. 
Also, there are many contexts that cannot be explained by the model presented here. 

However, by understanding the seniors’ relatively limited information-acquiring opportunities, this 
research can be used to design information systems or to develop social policies to help seniors’ information- 
seeking in the future. Additionally, more specific studies on seniors’ information-seeking and information- 
sharing activities would help bridge the information gap. A transition from traditional seniors to “new 
seniors,” who possess digital literacy, has been taking place across the world. In such a transitional period, 
we expect that the findings from this study can be useful for solving the various social issues caused by the 
rapid development of information technologies and rapidly aging populations. 


7 ~ References 


Armbruster, B. B., & Armstrong, J. O. (1993). Locating information in text: A focus on children in the 
elementary grades. Contemporary Educational Psychology, 18(2), 139-161. 

Dervin, B., & Nilan, M. (1986). Information needs and uses. Annual Review of Information Science and 
Technology, 21, 3-33. 

Dewan, S., & Riggins, F. J. (2005). The digital divide: Current and future research directions. Journal of 
the Association for information systems, 6(12), 298-337. 

Granovetter, M. S. (1973). The Strength of Weak Ties. American Journal of Sociology, 78(6), 1360-1380. 

Greene, J. C., Caracelli, V. J., & Graham, W. F. (1989). Toward a conceptual framework for mixed- 
method evaluation designs. Educational Evaluation and Policy Analysis, 11(3), 255-274. 


893 


iConference 2014 Sumi Kim & Heekyung Choi 


Kim, H.-S., Harada, K., Miyashita, M., Lee, E.-A., Park, J.-K., & Nakamura, Y. (2011). Use of senior 
center and the health-related quality of life in Korean older adults. Journal of Preventive 
Medicine and Public Health, 44(4), 149-156. 

Norris, P. (2003). Digital divide: Civic engagement, information poverty, and the Internet worldwide (Vol. 
40), Taylor & Francis. 

Sagiv, L., & Schwartz, S. H. (2000). Value priorities and subjective well-being: Direct relations and 
congruity effects. European Journal of Social Psychology, 30(2), 177-198. 

Sung, K.-T. (1991). Family-centered informal support networks of Korean elderly: The resistance of 
cultural traditions. Journal of Cross-Cultural Gerontology, 6(4), 431-447. 

Turner, K. W. (2004). Senior citizens centers: What they offer, who participates, and what they gain. 
Journal of Gerontological Social Work, 43(1), 37-47. 

Valentine, G., & Holloway, S. L. (2002). Cyberkids? Exploring children’s identities and social networks in 
on-line and off-line worlds. Annals of the Association of American Geographers, 92(2), 302-319. 

Van Dijk, J., & Hacker, K. (2003). The digital divide as a complex and dynamic phenomenon. The 
Information Society, 19(4), 315-326. 

Vela-McConnell, J. A. (1999). Who is my neighbor?: social affinity in a modern world, SUNY Press. 

Wilson, T. D. (2000). Human information behavior. Informing Science, 3(2), 49-56. 


8 Table of Figures 


Figure 1: The Senior Observation (Playing Go and Pocket Ball)... cece cee ceeeeecsetneecneeeneeneenaeees 890 
Figure 2: Hierarchical Layered Structures for Information Seeking Behavior for the Seniors ................... 892 
Figure 3: Hierarchical Layered Structures for Information Seeking Behavior for Younger Generations ...892 
Figure 4: Information-Seeking Behavior Model of Senior in Korea... eee cece cere cece enn eeereee 893 


894 


Project Tales: Reusing Change Decisions and Rationales in Project Management 


Lu Xiao! and Taraneh Khazaei! 


1 Human-Information Interaction Lab, University of Western Ontario 


Abstract 

Changes are inevitable in project management and project managers are often required to make change 
decisions that may have significant effects on the success of the project. To support project managers’ 
decision-making process in such common cases, we have designed and developed a tool called 
ProjectTales. This tool takes advantage of the valuable information buried in the history of projects and 
provides various visual and interactive representations of the previous changes. Using ProjectTales, 
project managers can explore the history of projects, find the change situations similar to their current 
one, interpret the impact of the change decision, and potentially reuse the decision and the rationale of 
the change. We are currently planning a user evaluation to compare our tool with a baseline system. 
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1 Introduction 


In the las few decades, many software programs have been designed and developed to address various 
aspects of project management such as estimating a project’s effort (Peischl, Nica, Zanker, & Schmid, 2009), 
managing collaborative activities (Zhang, Zhao, Moody, Liao, & Zhang, 2007), and monitoring projects’ 
changes and risks (Smith, Bohner, & McCrickard, 2005). In this study, we are interested in assisting project 
managers with making effective project-related change decisions. Projects are not conducted in vacuum and 
are normally affected by different dynamic factors such as availability of budgeted resources, level of priority 
in the organization, and project members. Hence, it is expected that project managers commonly need to 
make modifications to projects during the management process. 

Researchers have explored computer supported means of helping project managers to make valid 
decisions when projects need to be adjusted to the new situation. For example, Karvonen (1998) presented 
a computer supported management process to illustrate how computer systems could support project 
manager’s decision-making in a change situation during a delivery project as well as in the continuous 
business process improvement of a company. Sauve et al. (2008) presented a method to evaluate the risk 
exposure associated with a change to be made to the infrastructure and services in IT service management. 
With the risk exposure metric, this method automatically assigns priorities to changes. 

Our approach supports project managers in making change decisions by presenting interactive 
visual representations of the change history of the previous projects. Our assumption is that when facing a 
change situation, project managers may make more effective decisions if they are aware of rationales and 
decisions of the previous projects that correspond to the similar situations as well as the impact of those 
decisions on the projects. Therefore, our design of the interactive visualization tool has focused on presenting 
and making associations 1) among the history of previous projects’ changes, their causes, and their decisions’ 
rationales; and 2) between the decisions and the project status as an indicator of the decisions’ impact on 
the project. In this poster paper, we describe our current prototype system, ProjectTales. In the remaining 
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sections, we first discuss related work and then present the design rationales of the prototype. We conclude 


with our user evaluation plans and a summary of the research. 


2 Related Work 


After surveying project management tools such as TeamSpace and TeamSCOPE, Smith et al. (2005) 
proposed design guidelines of a project management tool for distributed design teams. They argued that to 
facilitate reuse of project management knowledge from previous projects, it is important to archive both 
projects’ product knowledge and their process-related knowledge. To illustrate their configuration approach 
of project management information systems, Berzisa and Grabis (2011) presented how knowledge of 
previous change requests can be combined with a request in the current project. In their approach, a request 
has two configurations in the system: the attributes of the request (e.g., priority, description, remaining, 
due date) and the status workflows (e.g., closed, open, agreed). When a request is made in the current 
project, the system will provide suggested descriptions of the attributes and status workflows based on the 
previous projects, i.e., the project management knowledge repository. 

Compared to the system presented by Beerzi8a and Grabis (2011), which facilitates the 
configuration of current changes based on the history of previous projects, our system focuses on facilitating 
project managers decision-making process using the change knowledge from the previous projects. In 
addition, we use interactive visualization techniques to let project managers browse and extract change 
information relevant to the current project situation and to reason the association between the changes and 
their impact on the projects. We discuss our prototype in more details in the next section. 


3 ProjectTales: An interactive visualization tool 


3.1 Database and Its Design Rationale 


Our prototype, so called ProjectTales, is designed and built upon the open source software EGroupware’s 
(http://www.egroupware.org/) history database. EGroupware is intended for businesses to manage 
contacts, appointments, projects, and to-do lists. It is an actively used software program with a reliable 
user reputation. For example, its user rating is 4.2 out of 5 on SourceForge.net (164 votes), and on its 
enterprise collaboration website, there are 9352 topics and 27071 posts for EGroupware users’ discussion 
forum as of Sept. 13, 2013. We thus understand that EGroupware’s history database has been widely 
accepted and used in real-world project management cases, and so we used it as the underlying database 
for ProjectTales. EGroupware’s history database provides diverse attributes for projects (e.g., title, 
description, priority, used budget) and several attributes for changes (e.g., project, changed project 
attribute, timestamp). We modified this history database to include the causes of the changes. With the 
causes available to project managers, they can easily retrieve the change decisions that correspond to the 
same causes as the current situation. When modifying a project, project managers are asked to assign one 
or more causes to the change. These causes can be selected from a pre-specified list of potential causes 
extracted from the literature (Dvir & Lechler, 2004; Steffens, Martinsuo, & Artto, 2007; Wu, Hsieh, & 
Cheng, 2005), including the external causes (e.g., political, economic, customer needs, market force) as well 
as the internal causes (e.g., staff or budget shortage, strategic decisions, quality control). In order to allow 
project managers tailor the causes to their specific organization situation, they are allowed to add new ones 
to the list. 

Arguing for the importance of providing the change rationales to the project manager, we also 
included this information in the database. In our research, we define the change rationale as the information 
that provides justification of the change decision. Social psychologists have investigated the kinds of 
information in communication that influences people’s attitudes (Barnard, Mason, & Ceynar, 1993; Petty 
& Wegener, 1998). These studies showed that the shared information that is plausible and logical and adds 
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new to the issue is more likely to cause attitude change (Petty & Cacioppo, 1984; Leippe & Elkin, 1987). 
Fabrigar, Priester, Petty, and Wegener (1998) have found when people had higher access to different 
attitudes, the message’s argument quality had a greater impact on persuasion. We believe that presenting 
not only the change decisions but also their rationales can benefit project managers when deciding the 
current situation. Xiao and Carroll’s (2013) study indicates that individuals’ reasoning skills are affected 
by sharing task rationale explicitly in group activities. By archiving and sharing the change rationales across 
the organization’s projects, it is expected that the project managers will improve their reasoning skills on 
making decisions in the long term. The improved reasoning skills can positively affect the quality of change 
decisions since decisions are made after better reasoning processes. 


3.2 Interface Design 


ProjectTales interface is divided into two separate components arranged vertically on the screen: the history 
overview component and the project detailed view component. Each component of the system then provides 
some coordinated views of different dimensions of the underlying dataset, supporting project managers in 
yielding deeper insight of the history data (Wang Baldonado, Woodruff, & Kuchinsky, 2000). Figure 1 
shows a screenshot of the system 


3.2.1 History Overview Component 


The history overview component consists of three different views, namely the history grid, the cause bar, 
and the rationale bar. The history grid is designed to provide a visual overview of the changes made in 
different attributes of different projects. In this view, each row represents a project in the database and 
each column represents an attribute of the project. Since colour is a pre-attentive visual feature and can be 
separately decoded from the spatial position by the human visual system (Ware, 2004), colour coding is 
used to represent the number of changes in different attributes of projects. When there has been no change, 
the cell is coloured with a light gray. In case of a change, the number of changes is encoded using the 
luminance channel of the red color; thus, cells with high change frequency appear darker and the ones with 
low change frequency are shown brighter. 

The history grid is further enriched with an interaction mechanism, allowing project managers to 
sort the projects according to the number of changes in a specific attribute by clicking on the attribute 
label. Similarly, clicking on a project label causes the attribute list to be sorted according to the number of 
changes for that project. Such a visual representation allows project managers to quickly gain an overview 
of the change frequency across different projects and different attributes, and to further explore this 
information for projects and attributes of interest using the sort feature. 

In addition to the history grid, the history overview component offers two vertical histograms to 
visually depict an overview of the causes and rationales for the changes. The cause bar represents a sorted 
view of the causes from the most frequent one to the least, whereas the rationale bar shows the most 
frequent words used in the change rationales. Since stop words are of no value when exploring the prominent 
words in rationales, they are eliminated from the rationale bar. The history grid is highly linked with the 
cause and the rationale bar via linking and brushing technique. Hovering the cursor over each cell in the 
history grid highlights the corresponding causes and rationales in both histograms. In addition, if project 
managers hover the cursor over a cause or a rationale, the corresponding cells in the history grid get 
highlighted. 


3.2.2 Project Detailed View Component 


When exploring the change history overview, project managers may be interested to focus on a particular 
project in detail. Double clicking on a project label loads the project detailed view component with two 
linked views of the status line chart and the project table. In the status line chart, the x-axis represents the 
duration of the project as a timeline. This axis is augmented with red glyphs, each representing a change 
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that has happened to the project in the specific time. Presenting changes on a timeline may allow project 
managers to investigate the sequence of changes and see how a particular change has triggered the following 
changes. The y-axis of the chart represent the status of the project, allowing project managers to assess the 
effects of a particular change or a series of changes on the project status. 

Finally, the project table provides project managers with the detailed information of changes, 
including the changed attribute, time of the change, old and new values, causes of the change, and the full 
rationale, in a textual format. These two views are also linked with linking and brushing techniques as 
hovering the mouse cursor over a glyph in the chart or a row in the table, causes the corresponding 
representation of change to get highlighted in the other view. These two views allow project manager to 
identify a potentially interesting change in one view and then detect the same change in the other view to 
perform further exploration. 
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Figure 1: A screenshot of ProjectTales 


4 Evaluation 


We are currently conducting an evaluation study of the prototype in a controlled laboratory setting. This 
study is designed as a within-subject study, in which participants are asked to perform a different decision- 
making task with each interface (ProjectTales and a table-based baseline system). The two tasks are 
designed to have the same complexity level and they consist of three subtasks about retrieving and 
interpreting information about changes and change rationales. We have been recruiting projects managers 
to use ProjectTales and compare it with the baseline system. 
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5 Conclusion 


Change management “is an integral process related to all project internal and external factors, influencing 
project changes; to possible change forecast; to identification of already occurred changes; to planning 
preventive impacts; to coordination of changes across the entire project” (Voropajev, 1998). In this poster 
paper, we presented a software prototype with the main purpose of facilitating project managers to make 
more efficient and effective decisions when deciding a possible change. With our current design, ProjectTales 
provides project managers with the ability to browse, explore, and gain an insight into the history of causes, 
decisions, and rationales of the previous changes through interactive visualization techniques. Our 
underlying design rationale is that archiving the change causes and rationales are as important as the change 
decisions themselves for future reuse. We are currently conducting an evaluation study to examine the 
usability and our rationales of the design. 
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Abstract 

Despite the efforts of academic libraries to develop various programs and services for users, college 
students still suffer from “Library Anxiety.” Also, academic libraries need to serve a new generation 
called the “Net Generation,” those students who have grown up using developed information technology. 
It is important that academic libraries adopt new technologies to attract these users. This study proposes 
a novel research design (Regression point Displacement—RPD) that will assess the utility of Mobile 
Augmented Reality (MAR) during an orientation program for STEM students. Because implementation 
of the new technology involves an initial introduction cost, the RPD design, which requires only one 
treatment group, can be beneficial in the evaluation of MAR. The results of this study will determine 
the effectiveness of an orientation program using MAR in the context of a real life learning experience. 
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1 Introduction 


Although academic libraries have developed various user-centered outreach programs, services and new 
processes, “library anxiety” remains a threat to college students’ full use of academic libraries (Brown, 2011; 
Kwon, 2010). Only a few students use a library as a starting point of their research, and many students are 
reluctant to approach a library, as well as a librarian, feeling discomfort or tension when they do so (Lee, 
2012). Further, library anxiety hampers not only library use, but the critical thinking skills necessary for 
students’ research (Kwon, 2010). 

While academic libraries are struggling to find ways to overcome library anxiety, they also have 
difficulty attracting a new group of students called the “Net Generation.” The Net Generation is composed 
of students who have grown up using developed information technology. With respect to library use, when 
students research a given topic, they heavily rely on web-based resources rather than on books (Kwon, 
2010). In order to find better ways to serve the Net Generation, libraries need to apply new tools that are 
more approachable and accessible. 

One possible solution, Mobile Augmented Reality (MAR) technology, has received increasing 
attention. MAR technology not only integrates library resources into a user’s environment, but also 
enhances interaction and promotes library use (Chen & Tsai, 2012; Hahn, 2012). There is evidence that 
MAR integration could potentially be effective within libraries, as various researchers have demonstrated 
its effectiveness in other settings, including museums and elementary school media centers (Chen & Tsai, 
2012; Damala, Cubaud, Bationo, Houlier, & Marchal, 2008). 

As always, however, the resources of academic libraries are limited. Moreover, there can be financial 
risks if libraries obtain new equipment and train staff in the use of the new technology, but implementation 
is not successful. It can also be difficult for academic libraries to determine the right timing and scope of 
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This study proposes a method to evaluate MAR’s effectiveness by employing a quasi-experimental 
design called Regression Point Displacement (RPD). The research will apply the MAR in an orientation 
program for freshmen to address the following research question: “To what extent does the application of 
MAR increase academic library newcomers’ task performance in a library?” 

The study will be a starting point for further research, as the initial efforts will examine MAR’s 
effectiveness in an academic library setting. Also, the findings from this study can be used to promote an 
academic library’s reference and outreach services, and to provide new users with a rich learning experience. 
Establishing MAR’s effectiveness can inform educators regarding these new tools for informal learning. More 
broadly, the findings will potentially provide a basis for applications and recommendations to adopt MAR 
in other informal learning organizations. 


2 Methodology 


2.1 Regression Point Displacement (RPD) 


A novel design for evaluating pilot programs in a quasi- 
experimental environment is termed Regression Point 
Displacement (RPD; Trochim & Campbell, unpublished; 
Linden, Trochim, & Adams, 2006). This regression model 
compares pretest-posttest results of a treatment group to 


Post-test 


the regression line established by the pretest-posttest F Treatment group 


scores of several control groups. A treatment effect is i ä Control groups 


presumed when the results of the treatment group are 


significantly different from the regression line. 
To conduct the RPD, the researcher constructs a Pre-test 
regression line of the control groups using the results from 
pre-post measures. Then, the researcher calculates the Piguve te The Omera RPD 
vertical displacement of a treatment group’s post-test 
results from the regression line. 
The statistical model of Analysis of Covariance (ANACOVA) used for RPD is: Yi = Bo + BiXit 
b Zi + e; Yi = the post-test value for unit I; Xi = the pre-test value for unit I; Zi = 0 if the X, Y pair is 
for a control unit, and 1 if the X, Y pair is for the treated unit; 8 o = intercept term; 8 = linear slope; B 
2 = the treatment effect (vertical shift from the regression line), and e; = the residual. The researcher can 
identify the treatment effect clearly when there is a statistically significant p value in the £ 2 coefficient 
(Linden et al., 2006, p. 421). 


2.2 Regression Point Displacement (RPD) 
The experimental design used will be Regression Point Displacement (RPD) (Table 1). 


Pretest Treatment Posttest 


An orientation using 
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Library Anxiety ; 
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A traditional orientation 
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Table 1: Comparison of the Two Groups 
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The Independent Variable: The independent variable will be the pre-test measure on the Library 
Anxiety Scale. 


The Dependent Variable: The dependent variable will be based on actual task performance: participants’ 
time on task. For example, after participants demonstrate an understanding of prompts, each participant 
will have an assigned library PC. They will conduct several activities, and report the time when each 
activity is completed on that PC. 


Intervention: As a treatment, the MAR service, which combines real and virtual information in real time, 
will be developed using Layar, one of the augmented reality development programs. The content will 
correspond to that included in a traditional orientation program, but will be presented at the point of the 
user’s desired location. There are three primary types of MAR programs: maps, information services, and 
links to a specific web page or for downloading of content. For example, the maps used for the experiment 
will be in 3D and will include in their MAR application descriptions of each section. There will be a sign in 
front of each piece of equipment, such as a printer or scanner; its MAR provides a video demonstration 
about how to use the equipment and information on fees or reservations. To reserve a study room, for 
example, the MAR application will link immediately to the reservation web page. 

All participants will perform six activities: 1) finding a book on a specific topic; 2) locating Science 
magazine; 3) making a color copy of the first page of Science; 4) making a color scan of a dissertation 
regarding a specific topic; 5) sending an email to a designated address, and 6) reserving a study room. The 
tasks required to find a book and a dissertation will be slightly different, allowing five students to conduct 
the activities simultaneously. Based on the results from each activity, this study will conduct six RPDs. 


Participants and Groups: The participants will be freshmen in the STEM field from a large state 
university. Since the RPD design uses aggregated, mean-level data, the unit of analysis will be the 
aggregated students’ average time on task per activity. For example, assume the academic library provides 
four orientation sessions over the course of ten days. Researchers will choose five students randomly from 
each session, resulting in 20 students per day. These 20 students’ means from the pre-test and post-test will 
establish the single treatment unit. Students on that day will receive treatment, MAR, instead of a 
traditional orientation. The remaining nine groups of students on the rest of nine days will establish the set 
of control groups. 

The five students assigned randomly from each session will participate in the activities 
simultaneously. The treatment group, indicated by the red dot in Figure 2, will be educated with MAR 
technology, both generally (e.g., location of collections and library services) and specifically (e.g., how to 
use a scanner). The control groups, indicated by the blue dots in Figure 2, will receive traditional orientation 
that consists of precisely the same content, but is administered by librarians. 


Methods: In order to achieve a strong treatment effect, the researchers will use at least 8.5 by 11 inch size 
signs to show participants the locations where they can use MAR and to introduce the applications. The 
MAR group will use an iPad rather than an iPhone, because of the former’s larger screen. The researchers 
will make it possible for participants to use MAR for all signs, including directories/maps and book stacks, 
and all technology and tools in order to maximize MAR use. 


3 Expected Results 


The control groups who have support from librarians are expected to require substantially more time to 
complete activities, while the treatment group is predicted to spend less time when they use MAR for their 
orientation. By analyzing each activity, our results will determine the effectiveness of an orientation program 
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using MAR, and this study will provides users with a differentiated learning experience. Further, the 
augmented reality approach can promote an academic library’s reference and outreach services by 
identifying the right timing and scope of technology introduction for students who are quite familiar with 
the Internet as a means to access information. 
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Abstract 

The term “Scientometrics” was coined in 1969 by scholars from the former Soviet Union. After decades 
of growth, the international field of scientometrics has become increasingly mature. This research 
examines the growth of the Scientometrics journal, measured by annual publication number and growth, 
annual citation frequency, annual growth of citation frequency by applying correlation analysis and 
regression analysis, in an attempt to model the literature growth of the journal. By considering 
internationally important events in the field of scientometrics since 1978, the growth of the discipline is 
divided into three stages. Each stage is further analyzed using regression analysis to trace the literature 
growth. In addition, based on visualization method, the main topic areas are identified for each of the 


three stages of scientometrics’ development. 
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1 Background 


The term “Scientometrics” has been first used as a translation of the Russian term “naukometriya” 
(measurement of science) coined by Nalimov and Mulchenko (1969). The research area of scientometrics 
began during the second half of the 19th century. It has over 100 years of history. During this time, 
scientists’ studies of scientometrics shifted from the unconscious to consciousness, from qualitative research 
to quantitative research, and from external description to detailed study revealing the inherent properties 
of scientific production. Previous Scholars (Pang, 2002; Yuan, 2010) tend to divide the development of 
scientometrics into three stages: embryonic period (from the second half of the 19th century to early 20th 
century), the founding period (from the beginning of the 20th century to the 1960s), and development 
period (after the 1970s). And in order to study the development period of scientometrics, Schubert A. (2002) 
indicated that as the representative communication channel of its field, the journal Scientometrics reflects 
the characteristic trends and patterns of the past decades in scientometric research. That’s why this study 
— like some of its predecessors (Schoepflin & Glanzel, 2001; Hou, 2006) — uses the journal as a representative 


model of scientometrics research. 


2 Purpose 


This paper proposed a comprehensive statistical overview of the journal Scientometrics to study the 
evolution of scientometrics. Quantitative analyses and informetric methods are employed to describe the 
evolution of scientometrics during nearly 40 years after entering the development period. And then, using 
visualization methods to display the growth of publication and the main research areas of each stage of the 


development period. 
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3 Data and Methods 


For the development period of scientometrics, the foundation of the journal Scientometrics (in September, 
1978) is a landmark event. The research data involves 3482 documents published in Scientometrics during 
1978 to 2013 retrieved from the Web of Science on August 20th, 2013. Using Microsoft Excel to count 
Scientometrics’ annual publication number, annual cited frequency, and annual growth of cited frequency, 
and using correlation analysis and regression analysis to simulate the field of literature growth curve by 
SPSS. In addition, choosing highly cited papers from each stage of the development period, and using 
document co-citation analysis, factor analysis and multidimensional scaling analysis reveal the main research 
areas of each stage. 


4 Results 


According to the analysis of the annual number of publications, annual cited frequency, annual growth of 
publication number and annual growth of cited frequency, the growths of number of annual publication and 
cited frequency show the similar trends, continuing with rapid growth after a period of relatively flat growth 
(shown in the Figure 1 and Figure 2). The changes in growth rate between the annual publication number 
and cited frequency follow similar tendencies, showing continued volatility after the first vertex (shown in 
the Figure 3 and Figure 4). 
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Figure 1: Annual publication number of Scientometrics 
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Figure 2: Annual cited frequency of Scientometrics 
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Figure 3: Annual growth of publication number 


Figure 4: Annual growth of cited frequency 


A bivariate correlation analysis between the publication number and the publishing time resulted in a 
Pearson correlation coefficient of 0.880, which is significant at the 0.01 significance level (two-sided). The 
data were modeled against different growth functions. A logistic function shows relatively high fitness based 
on R square and F values. This indicates that the Logistic model provides the best fit to simulate the 
growth of articles in the journal Scientometrics’. The fitted curve is shown as Figure 5, with the estimate 
of parameters appearing in Table 1. 


R square F df1 df2 Sig. Constant b1 
Logistic .792 125.715 1 33 000 2.671E52 .939 


Table 1: The estimate of parameters 
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Figure 5: Scientometrics’ paper growth curve fitting 


Taking all the trends of the Figure 1 to Figure 4 into account, and referring to the important events in the 
international field of scientometrics from 1978 to present, the evolution of scientometrics can be divided 
into three stages: Early development stage (from 1978 to 1986), Gradual maturing stage (from 1987 to the 
end of the 20th century) and the Golden development stage (from the beginning of the 21th century to 
present). The fitted curves of the growth of the publication number with publication time are shown 
respectively in Figure 6, Figure 7 and Figure 8. The regression analyses reveal that the growth of the 
number of publications during the first stage and third stage separately follow a linear distribution and 
exponential distribution, respectively. However, the data of the second stage shows relatively big 


fluctuations resulting in poor fits for linear and nonlinear models. 
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Figure 6: Scientometrics’ paper growth curve fitting of early developing stage 


908 


iConference 2014 Yuehua Zhao & Rongying Zhao 


120.00 


1986 1989 1992 1995 1998 


Figure 7: Scientometrics’ paper growth curve fitting of gradually maturing stage 
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Figure 8: Scientometrics’ paper growth curve fitting of golden developing stage 


The outcomes of document co-citation analysis, factor analysis and multidimensional scaling analysis reveal 
the main research fields of each stage. The early development stage has five main fields. The arrangement 
reflects the principal components variance contribution as follows: (1) scientific literature structure, citation 
analysis theory; (2) bibliometric laws of quantitative and statistical; (3) application of citation analysis, 
scientific cooperation; (4) discipline analysis; (5) evaluation of scientific research. During the gradual 
maturing stage also contains five main themes: (1) citation theory, basic scientific research evaluation; (2) 
scientific collaboration; (3) co-citation, science mapping; (4) scientometric indicators, application of 
bibliometrics; (5) scientific elite. Seven focusing topics arises during the golden development stage: (1) 
application of citation indexes, patent analysis; (2) assessment of research performance; (3) scientometrics 
classic probability distribution and its application; (4) social network analysis and scientific research 
collaboration, the university ranking; (5) hirsch-type index; (6) research collaboration and national research 
performance; (7) application and visualization of scientometric indicators. Figure 9, Figure 10 and Figure 
11 respectively present the results from multidimensional scaling analysis. The thickness of each principal 
component boundary line reflects the heart of the research field and the distribution of the influence. 
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Figure 11: Main research field of golden developing stage of scientometrics 


5 Conclusions 


According to the results of the regression analysis, the growth trends of publication of the three stages of 
the development period exactly fit the scientific literature’s growth model proposed by Price (1951). 
Scientometrics has been experienced rapid growth in the amount of literature being published. This trend 
likely to continue. On the other hand, through comprehensive and comparative review of the main research 
areas of each stage, we can draw the following conclusions: (1) citation analysis has been a core research 
area during every stage; (2) research interest has shifted from theoretical research to applied aspect; (3) 
visualization methods and scientific mapping have attracted more and more attention and will become a 


main research area of scientometrics. 
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Abstract 

This poster describes an incremental approach to developing reflective practice in LIS students anchored 
in three different types of assignment debriefing incorporated into a regularly taught elective course on 
user instruction. An overview of these three reflective techniques illuminates tensions in reflective practice 
and suggests how each technique might fit into a programme of building reflective practice for future LIS 
practitioners. In particular, it shows how student and teacher reflection work together to inform each 
other, evolving practice for the learner and educator. 
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1 Introduction 


Reflection is a fundamental aspect of learning from experience, and reflective thought is essential to the 
improvement of both teaching and learning. Reflective practice is a term coined by Donald Schön in his 
important text The Reflective Practitioner (1983). Practitioners often know more about what they are 
doing than they can articulate. In fact, much of the work of effective practitioners becomes tacit over time. 
However, to maintain professional growth, as well as to document the contributions of effective practice in 
society, it is critical that we develop a culture of active reflection among LIS practitioners. Schön’s early 
articulation suggested that much of what happens in graduate school is of little value compared to workplace 
professional development. However, in later work (1987) he suggests that reflection can be cultivated and 
bridging activities can prepare students for effective practice through structured debriefing. Newer research 
further suggests that we can employ teacher-designed technologies to facilitate reflection, and build reflective 
patterns that students can emulate long after they depart university for the working world (Laurillard, 
2012). 

While Schön is credited with originating this specific term and its definition, we see the roots of 
reflective practice in much earlier writing, specifically in the works of the philosopher and educational 
reformer John Dewey (1933; 1938). Reflective practice is commonly linked with the ongoing development 
of professional skills, but can be applied to many fields of human endeavour in which people strive to 
improve. In Dewey’s conceptualization, reflection combines experiences or events with prior experiences, 
thoughts and emotional reactions in a process of lifelong learning. Reflection is key to continuity, to human 
growth and development, and helps the learner to avoid miseducative experiences: those that undermine 
our ability to engage in future learning. 

While the process of reflection is considered of vital importance, as teachers we often find it difficult 
to fit into our overloaded curricula. Furthermore, for many students who are new to a domain, reflection is 
uncomfortable: it exposes the weaknesses in their developing conception of self as a practitioner and 
highlights their role as novices. Students in the author’s courses report that reflective assignments are rarely 
prescribed in formal education, in their prior LIS courses or in their undergraduate experiences. The research 
on professional education (e.g., Loughran, 1996; Lyons, 2010) supports this anecdotal evidence. Yet, without 
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effective guidance, practice, and motivation, reflection is not likely to become a regular part of a 
practitioner’s routine. One of the most direct approaches to building reflective practice is the assignment 
debrief, whereby students reflect on their work or performance and identify strengths and opportunities for 
improvement, either as a conversation in class or in a written essay submitted to the instructor. 


2 Method and Data Source 


This paper will document three different debriefing techniques employed with a student performance-based 
assignment, showing an evolution in the author’s approach to reflective practice. In the author’s user 
instruction course, students are assigned to develop and present a 10-12 minute lesson related to an LIS 
topic and specific to a context of their choosing. Students present on a variety of topics, ranging from 
teaching parents how to select appropriate books for a toddler, to teaching seniors how to access government 
services online. Their lessons are video recorded and a copy of the video file is provided to each student to 
support their reflection within three days of the in-class performance. Three different debriefing techniques 
were employed to prompt and mediate student reflection on their teaching with different classes of students 
in sequential academic terms (16-23 students per class). 
Briefly, these three conditions were: 


e Unstructured debrief: Students were simply prompted to “reflect” on their teaching performance 
without detailed instructions on what to include, beyond a suggested length (300 words) and basic 
formatting criteria. Essays were to be submitted within one week of receiving the video. 

e Structured debrief: Students were prompted to engage in structured debriefing of their teaching 
performance using a 6-step technique adapted from Gibbs (1988). The length of the essay and 
formatting were identical to the first reflective condition. 

e Computer-mediated debrief: Students’ videos were uploaded to an online tool called the 
Collaborative Lecture Annotation System (CLAS), an environment which allows the user to view 
and annotate the video, in lieu of writing a separate reflection document. There were no length 
requirements for students’ annotations. 


3 Findings 

Each of these conditions prompted different types of student reflection. Unstructured debriefs were, overall, 
emotionally rich but more negative regarding student self-assessment. Structured debriefs provided a 
balance of positive and negative assessment, but often lacked reference to specific details of students’ 
teaching technique. The CLAS system focused student attention on specific aspects of their performance, 
while losing some of the broader, synthesizing statements evident in the other two debriefing conditions. 
These techniques will be illustrated in the poster with examples from students’ reflective writing and 
annotation, used with their permission. The author will also identify some of the tensions around these 
techniques, including time and effort to prepare for student reflection. Many students’ initial resistance to 
participating in reflective activity was balance by their subsequent assessment of the activity as valuable. 


4 Conclusions 


This poster presents an overview of these three reflective techniques and suggests how each might fit into 
a programme of building reflective practice for future LIS practitioners. In particular, it shows how student 
and teacher reflection works together to inform each other, evolving practice for the learner and educator. 
This work relates to the conference theme of “breaking down walls” though its innovative use of new 
technologies and its dedication to overcoming barriers to reflective practice. It presses faculty in LIS to ask: 
How can we best educate practitioners to remain vital and effective contributors to society by promoting a 
culture of reflection? 
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Abstract 

This poster reports preliminary findings from a project that examines enhanced picture books for children 
(“book apps”) as designed multimodal experiences. We expand Lewis’ ecological approach to picture 
books, developing a new model for the evaluation of book apps. We report early efforts to identify the 
qualities of this emergent genre. Our poster elaborates on several tensions in the design of high-quality 
multimodal literature, and promises to inform the efforts of parents, librarians and teachers who use 
enhanced reading materials in the development of early literacy. 
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1 Introduction 


The introduction of new reading devices, including the Apple iPad, Amazon Kindle, Kobo Reader, and 
multitouch smartphones, is radically changing the way we consume text (Chiong, Ree & Takeuchi, 2012; 
Ellis & Blashki, 2004). The landscape of children’s reading and early literacy development in particular is 
changing dramatically as many children share access to these new devices at home, school, and the library 
(al-Yaqout, 2011; Bird, 2011; Cooper, 2005; Druin, 2009; Smith, 2002). The emergence of the “book app” 
and enhanced e-books for children marks an important milestone in the way young children engage with 
stories. Enhanced e-books are electronic books that incorporate additional features to complement 
traditional picturebook elements, namely text and images, with audio, video, animation, and interactive 
games (Bird, 2011). These new components promise to engage children in new and exciting ways, but, when 
inappropriately used, they can also distract young minds and detract them from narrative comprehension. 
Children are engaging with e-reading technologies at an early age, yet we still know very little about the 
effects of e-reading and whether it supports or constrains the development of early literacy (Gasparini, 2012; 
Gutnick, Robb, Takeuchi & Kotler, 2010; Hinchliff, 2008; Shuler, Levine & Ree, 2012). Furthermore, 
research studies that support our understanding of how e-books fit into the ecology of children’s literacy 
practices are few. By literacy ecologies, we mean the wider context of reading (the who, what, when and 
where) in children’s lives, not just their ability to decode text. 

This poster reports early findings from a project designed to fill the gap in our understanding of 
early literacy development in a context where digital reading is being incorporated in homes, schools and 
the workplace at a dramatic pace. It is critical to study emerging readers and their use of technology, as 
early childhood literacy can have significant effects on young people’s developmental trajectories into 
adulthood. This three-phase study explores three ecological levels of e-reading (the textual, personal, and 
institutional) contributing to a holistic understanding of enhanced texts and how they fit into the literacy 
practices of youth ages 4-8 years of age. In the course of this project we address the following questions: 


1. What is the emerging nature of the enhanced children’s text? 
2. How does e-reading fit into the literacy practices of early readers? 
3. How does e-reading align with the broader aims and practices of institutional literacy development? 
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2 Theoretical Framework 


Due to the novelty of storybook apps, little theoretical reflection has been applied to this emergent form of 
text. Our study is rooted in Lewis’ ecological theory of the picture book, which places the reader at the 
centre of the reading experience, and acknowledges the interrelationship between visual and textual elements 
(Lewis, 2001). In Figure 1 we show a representation of Lewis’ theory. While this seems rather simple on its 
face, the notion that the visual and textual elements of the story work together to make meaning of a 
picturebook was, and in some circles still is, a provocative suggestion. Our theoretically informed approach 
pushes further on this concept by bringing in the multimodal elements of contemporary picturebook apps, 
recognized that they are both designed narrative experiences and designed multimedia experiences. 


Textual <————>. Visual 


Na 


Figure 1: Lewis’ Ecology of the Picturebook 


The flexibility and complexity of e-books further leads us to examine this phenomenon not only using the 
ecological theory of multimodal books, but also the ecological theory of reading. Literacy is a practice that 
takes place in varied and complex contexts, involving interplay of space, time, resources, abilities and human 
values (Burgess, Hecht & Lonigan, 2002; Smeets, 2012) To date, studies of e-books for young children have 
focused on comprehension: whether reading e-books leads to better story recall or narrative understanding 
(Gasparini, 2012; Smeets, 2012; Vaala & Takeuchi, 2012). These studies involve children reading an e-book 
once, often in a sequestered fashion. The challenge in making recommendations from these studies is that 
these conditions poorly represent how children actually engage with texts, which often involves repetition, 
mediation, and playfulness. 


3 Methods and Data Source 


Our research project is addressing the research questions in three phases; each phase will tackle one of the 
questions. We are moving iteratively from the level of the book to the level of the reader to the level of the 
institution, toward greater ecological complexity with each phase. Our expanded framework is seen in Figure 
2. We first created a database of over 200 reviewed book apps appropriate for readers ages 4-8 years. We 
then narrowed this list to 100 for our coding. A rubric coding scheme was devised and tested using a second 
randomly derived sample. We have completed the coding of these texts and preliminary findings are already 
emerging. The outputs of this phase will be both a scholarly analysis of the quality of enhanced picturebooks, 
as well as criteria to guide parents, teachers and librarians in the selection of enhanced texts. 
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Figure 2: The Expanded Ecology of the Book App 


4 Findings 
Although we are still in the midst of our first phase analysis, several themes are emerging from our work, 
which we describe below: 


4.1 The Challenge of Mulitmodal Evaluation Frameworks 


Storybook apps are diverse in the features they use and how they use them. As a result, developing an 
evaluation scheme for use across a wide range of multimodal texts has been particularly challenging. 
Informing our selection of book apps were a number of review sources, including professional (e.g., Kirkus, 
Horn Book, SLJ, Booklist) and user-generated review media (e.g., iTunes, iMum’s blog, Digital Storytime, 
Commonsense Media). A significant challenge to anyone selecting storybook apps is determining the 
reputability of these sources, since only a select few are connected to institutions that attest the quality of 
the reviews, e.g., Booklist, or Horn Book. Many sources concentrate their evaluation on the digital features 
present in book apps and pay little attention to the literary quality of the story. All of these factors make 
the selection of storybook apps particularly complicated. Our evaluation of review sources confirms that the 
variability in reviews can lead to confusion over quality: an e-book with excellent reviews for its literary 
merits in a professional review can be panned by parents if the app does not keep children engaged 
independently. We found that different values are evident in these reviews, and the different stakeholders 
(parents, teachers, librarians) can have widely different impressions of what a “good app” looks like. 


4.2 Metafictive Multimodality 


Apps can be “born digital”, that is, created without a specific print precursor, such as Nosy Crow’s 
Cinderella (2012), although based on a Grimm Fairy Tale, or they can be direct adaptations of a new or 
old text. The Monster at the End of This Book is one such adaptation. Based on John Stone’s 1971 
picturebook its narrative is built completely upon the notions of metafiction and self-referentiality; the 
reader’s actions are what allow the narrative to unfold. Although remediated into an app, the story is 
constructed upon the conventions and gestures of the print book, exploring its materiality and investigating 
the performative act of manipulating the book as an artifact. 

Don’t Let the Pigeon Run This App! (2012) is an app based on Mo Willems’ Don’t Let the Pigeon 
Drive the Bus! (2003). The picturebook and its sequels were already considered books with high levels of 
interactivity; the app version repeats the same formula, but adds a new level of participation, performance 
and co-authoring. Participation in this app is a metafictive device, promoting awareness of the story 
structure and template to readers. This app provides an example of high participation with very limited 
direct interaction with the touchscreen during the storytelling experience — an element that likens it in 
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certain extent more to an animation film rather than to a print book or to most storybook apps available. 
Don’t Let the Pigeon Run This App! is an example of an app that, although migrating from the picturebook 
format, does not seem to remain attached to the print book format, exploring the features of digital media 
with more freedom, but also risking distancing itself quite substantially from the concept of a book, 
hybridizing with other forms of digital media. 


4.3. The Important Role of Paratext 


While we do not often think of paratextual elements as integral to the reading experience, in an app they 
can contain significant interactive elements. Paratexts are one aspect in the design of a book that has 
experienced significant changes in the remediation from print to screen. Front cover, full title page, half 
title page are often reduced to one screen where the title of the book is shown, and where, only occasionally, 
authors and publishers information are present. For example, The Melody Book’s A Jazzy Day (2012) does 
not mention author, illustrator, and other professional that participate in the complex production of an 
app. The “info” page has, however, links to Facebook and Twitter, an element present in most storybook 
apps and that concerns parents and teachers, leading children to websites inappropriate for their ages. To 
overcome this problem, some developers, notably Nosy Crow Studios, created a barrier for children to access 
some of the paratexts. In Cinderella (2012) the credits, how to use information, and links to social media 
and to other apps are in a section called “For Grown-ups”, which can only be accessed after following some 


simple guidelines presented in written text, blocking it from young children yet unable to read. 


5 Conclusion 


Our approach to book apps through an ecological approach expands existing notions of the picturebook and 
offers new ways of thinking about these media as interactive story experiences. At a time when technology 
is advancing faster than our understanding of its effects, we need richer approaches and deeper findings 
that explain how new media interact with our lived experience. This project is generating new scholarship 
on the way children read in the digital age and will inform parents, teachers, librarians and communities 
who mediate their literacy development. 
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Abstract 

A huge amount of metadata is being published from various communities, domains and countries on the 
Internet. Metadata schema designers need to design metadata schema for their applications considering 
the interoperability. Linked Open Data (LOD) is the concept and movement to facilitate the use of 
datasets across communities on the Internet. However, to publish metadata as LOD, metadata schema 
designers must be experts of LOD. Metadata schema design, which is similar to software design, requires 
design and evaluation cycles where the iterative prototyping of metadata is useful. Our goal is to help 
metadata schema designers design a metadata schema in an iterative prototyping process. In this paper, 
we describe a metadata schema design methodology based on the agile development model. And then, 
we propose a system to support the proposed methodology including a metadata-editing tool. Finally, we 
show some metadata schemas as examples. 
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1 Introduction 


There is a huge amount of metadata on the Internet and still more is being published by various 
communities such as governments, research institutions. These datasets are used for different purposes 
across different communities, domains and countries. Thus, metadata schema designers, who design a 
metadata schema, need to consider metadata interoperability in their schema design. Linked Open Data 
(LOD) (Berners-Lee, 2006) is the concept and movement to facilitate the use of datasets on the Internet. 
However, in LOD, data must be described by RDF (Resource Description Framework), which is a standard 
model for data interchange on the Internet. Therefore, metadata schema designers need special knowledge 
and experience of using the Semantic Web to consider a metadata schema, which conforms to the 
requirement of LOD. In addition, a metadata schema design process needs an iterative process, which is 
time-consuming work. In this study, we explain a metadata schema design methodology based on an agile 
development model. Then, we propose a system to support metadata schema design and develop a 
metadata-editing tool based on the methodology. 


2 The Process of Designing Metadata Schema 


We divide the metadata schema design process into two steps (Figure 1) based on (Coyle & Baker, 2008; 
MI3, 2011) and on our experience. We assume that metadata schema designers already have clear functional 
requirements. In step 1, metadata schema designers define the domain model, metadata attributes and their 
value constraints. In step 2, metadata schema designers define metadata structure and metadata terms for 
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properties and classes from definitions of step 1. Figure 2 is an example of definitions of step 1 and step 2. 
Process of step 1 is familiarized with many engineers, because it accords to designing relational database 
(RDB) schema. While many engineers have knowledge and technique of RDB, few engineers have knowledge 
and technique of RDF and LOD. Thus, step 2 is generally more difficult than step 1. 

Metadata schema designers repeat changes in the metadata schema each time they verify metadata 
schema by creating and using metadata. Through validation of metadata schema, metadata schema 
designers often discover problems in their metadata schema, e.g. difficulties to input appropriate values, 
unsatisfied requirements and so forth. Iterative prototyping of a metadata schema is useful to design a 
metadata schema and to test whether those schemas meet the requirements. On the other hand, this 
iterative process is very time-consuming. Basically, metadata schema designers have to input metadata one 
by one. In addition, the more complex metadata schema become, the more metadata schema designers need 
a metadata-editing tool to input metadata suitably. 

Metadata schema design process is similar to the agile software development, which is one of the 
software development processes and based on an iterative and incremental development. We adopt the agile 
software development method to the metadata schema design. We help metadata schema designers carry 
out metadata schema design development in a process analogous to agile software development. 


Based on definitions in step 1 


Step 2: 
Define details to express 
metadata in RDF 
* find metadata terms 
* convert the domain model to 
metadata structures 


Step 1: 
Define a rough schema 
domain model 
metadata attributes 
metadata value constraints 


Improve the metadata Create and use metadata 
schema 


Figure 1: The Process of Designing Metadata Schema 


convert 


Book Author 


i dcterms:issued publish date 
title name 


publish date birth day 


foaf:name 
title is mandatory 
XN>2 l 
foaf:birthday birth day 
definitions of step 1 definitions of step 2 


Figure 2: An Example of Definitions of Step 1 and Step 2 
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3 Support System for Designing Metadata Schema Based on Agile Development Model 


The support system enables metadata schema designers to design a metadata schema in the same way as 
designing a RDB schema along the agile process. Thus, its users are metadata schema designers who are 
not familiar with designing RDF metadata schema, but RDB schema. We adopt the three ideas to solve 
issues in previous section. 


Metadata schema designers execute step 1 by designing RDB schema for metadata-editing tool. 
Using a RDB schema of a metadata-editing tool, the support system helps metadata schema 
designers execute step 2. 

Using a metadata schema, the support system helps metadata schema designers develop a metadata- 
editing tool. 


Based on these ideas, the support system will facilitate the conversion of the RDB schema of the metadata- 
editing tool into an RDF metadata schema. On the other hand, it will also help with the development of 
an experimental metadata-editing tool. 

Figure 3 shows the iterative cycle of designing metadata schema and the development of the 
metadata-editing tool. Figure 4 shows the detailed metadata schema design cycle and the role of the support 
system. The support system provides two kinds of support functions to metadata schema designers: 


Support 1. Proposing a draft metadata schema based on the RDB schema of the metadata-editing 
tool 

Support 2. Providing a template which is the base of the metadata-editing tool from the metadata 
schema 


In this approach, metadata schema designers first design a RDB schema of a metadata-editing tool. At this 
stage, the metadata-editing tool does not have the ability to create RDF metadata but only to input 
metadata. Metadata schema designers execute step 1 through this task. Second, the support system proposes 
a draft metadata schema based on the RDB schema of the tool. The support system performs two main 
tasks to create a draft metadata schema: 


Searching terms for properties or classes to express in RDF using table names, column names in 
the RDB schema. 
Converting tables and their relations in the RDB schema to metadata structures 


After that, metadata schema designers complete the metadata schema based on a draft. And then the 
support system provides a template, which is the base of the metadata-editing tool, from the metadata 
schema. When a metadata-editing tool, which has function to create RDF metadata along the metadata 
schema, is developed, metadata schema designers have completed one cycle can validate their metadata 
schema. From result of validation, if the metadata schema needs improvement, metadata schema designers 
modify the RDB schema of the tool and start next cycle. 
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M Validation/ 


Improvement 


Development of 
a metadata- 
editing tool 


Reflection of the 
metadata schema 


Reflection of the tool 


Designing of a 
metadata 
schema 


Validation/ 
Improvement 


Figure 3: The Cycle of Designing Metadata Schema and Development of the Metadata-editing Tool 


(2)Validation/ 


(1) Develop the improvement of 


. tool by metadata thetoolby 
nt penetin desire! the metadata- metadata schema 
aN => editing tool RDB designer 


Metadata schema 
designer (3)Get the RDB schema 


(8)Provide a template b' by the support system 
the support system 


The support system 


Support 2: Support 1: 


Provide a template Propose a draft of 
for a metadata-editing tool a metadata schema 


(6)Get the metadata schema (4)Propose a draft by 
by the support system the support system 
the metadata schema | — 
fee wii 
(7)Validation/improvement (5)Design the metadata Metadata schema 
of the metadata schema by schema by metadata designer 
metadata schema designer schema designer 


Figure 4: Details of the Metadata Schema Design Cycle 


4 Implementation 


An iterative process like agile software development seems very time-consuming at first a glance. But 
recently, some useful support tools help engineers develop software easily and quickly. Ruby on Rails! is 
one of famous web application frameworks. The support system assumes metadata schema designers use 
Ruby on Rails as a metadata-editing tool. 

We develop two Ruby libraries. One is for proposing a draft of metadata schema based on the RDB 
schema of the metadata-editing tool (support 1), the other is for providing a template for the metadata- 
editing tool from the metadata schema (support 2). A draft of metadata schema is an incomplete Description 
Set Profile (DSP) (Nilsson, 2008), which allow metadata schema designers to define detailed constraints of 
metadata. In a draft DSP, some candidates of metadata terms searched by using the dataset from SPARQL 


1 http://rubyonrails.org/ 
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endpoint in Linked Open Vocabulary? are described addition to metadata structures. A template for a 
metadata-editing tool is the Rails Application Template, which enables users to omit some process for 
development. 


5 Examples of Metadata Schema 


The support system converts RDB schemas to metadata schemas, while it converts metadata schemas to 
RDB schemas. This paper explains with a focus on method of conversions from RDB schemas to metadata 
schemas. The support system judges how convert them from some conditions such as relationships between 
tables, the number of columns of the table and user’s requests. 

Figure 5 shows an example of the conversion, which has a relationship of one-to-many. In this case, 
the number of columns except for primary key and foreign key in the table Member is two, i.e. more than 
one. Therefore the table Member becomes a resource or a blank node and users judge whether is good. 

Figure 6 shows an example of the conversion, which has a relationship of one-to-one. In this case, 
users first judge whether the table Place should have a structure or not. Figure 6 is the case that users 
judge a structure is not needed. Thus the resource Organization has two properties, which is for latitude 
and longitude, instead of the resource Place. 

Figure 7 show an example of the conversion, which has a relation ship of many-to-many. This is 
very complex case. The table Writing has no column except for primary key and foreign key. Thus in 
metadata schema, resources Book and Author have properties to relate to each other. 


Organiz- 
ation 


Id N | organization_id 


name name 
birthday 


One-to-many 


foafiname 4 
name 


1 
foaf:birthday >) birthday 


Figure 5: Example 1 of Metadata Schema: One-to-many 


? http: //lov.okfn.org /dataset /lov/ 
3 http: //guides.rubyonrails.org/rails_application_templates.html 
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Organiz- 
ation 


latitude 
longitude 


One-to-one 


Book Author 


id id 
title Name 
publisher birthday 


name 


1 N 1 foaf:name 
: 


foaf:birthday T birthday 


N frbr:creatorOf 1 


Figure 7: Example 3 of Metadata Schema: Many-to-many 


6 Related Work 


Malta & Baptista (2013) have proposed a method for the development of DCAP (Me4DCAP). They also 
adopt the iterative life-cycle development model. However this research does not cover the support for 
actual development of metadata schema using their method. 

Many studies propose a mapping method to convert RDB schema to RDF metadata schema (Hart, 
Reif & Gall, 2011). But their metadata schemas are almost always expressed in Web Ontology Language 
(OWL). We describe a metadata schema with DSP, which is more specific about expressing constraints 
about metadata structure than OWL. 


7 ~ Conclusion 


This is work in progress, so we will evaluate the support system as next step. For evaluation, we will verify 
whether we can design metadata schema actually along the conversion pattern we show in this paper. 
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1 Introduction 


Research activities in all fields produce data in different forms and shapes. Advances in computing 
technology allow produced data to be stored in a digital format. In addition, more and more historical 
records, which originally were captured on paper, are being digitized. The move towards digital data is 
ubiquitous; it introduces new ways of making the data available to the public and be reused (Borgman, 
2008, 2009). Nevertheless, very often research data are not shared at all, or shared on researcher’s or 
university’s web page, making it less discoverable (LeClere, 2010; Nelson, 2009). Crosas (2011) argues that 
researchers are reluctant to share their data because traditional approaches do not facilitate control and 
ownership of the data by the author. In this poster we identify two other problems with current approaches 
(Section 2) and introduce our Col*Fusion system as a solution (Section 3). 


2 Motivation 


A number of tools (e.g., DataUp (Strasser, 2013)) and data repositories (e.g., ONEShare (Strasser, 2013), 
Dataverse Network (King, 2007), DataDryad (datadryad.org), DSpace (Smith et al., 2003), Socrata 
(socrata.com), Factual (factual.com)) were developed to facilitate data sharing and preservation processes. 
Usually a data repository is a cloud service with web interface that allow users to submit their data via 
browser. Advantages of data repositories include ease of use, persistent storage, public distribution, and 
recognition (through citation via unique dataset identifier), and search for datasets based on metadata. 
Some repositories provide visualization tools and statistical analysis. The disadvantages of current 
approaches include repository isolation and dataset isolation within a repository. The former problem is 
related to the fact that some repositories are created only for specific research areas, journals or universities. 
Therefore users would need to know where to find the dataset they are interested in and where to submit 
their dataset. Dataverse Network and Datalib (datalib.org) attempt to solve the problem by allowing users 
to search within a set of repositories: the first one does it automatically as all dataverse networks are 
connected, and the second one allows users to create and curate records that describe data repositories that 
users can search. The later problem — dataset isolation — to the best of our knowledge is present in all data 
repositories. 
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Perhaps, the dataset isolation problem is better shown by an example. Interdisciplinary research, 
which is becoming more common and more often funded, touches several areas. As the result, for the 
interdisciplinary research question, a researcher might need to have data that are produced by different 
researches and stored in separate datasets. Answering her question, researcher would need to manually find 
all related datasets and then merge them by herself. Suppose that the required data are stored in datasets 
Dı and Ds. Dı and Dz might not be directly related to each other (they might not share any common 
variables), however they might be related via other datasets, for example D; is related to D3, D; to Diand 
D; is related to D2. Once found, those datasets need to be integrated. Data integration is rather a complex 
procedure consisting of several activities such as schema matching, record linkage, query execution and 
search over integrated sources, and keeping track of lineage and provenance. The integration is even more 
complex if the datasets come in different formats. 

The data integration problem has been an interest of both academic and industrial research for the 
last 30 years. Architectures of current data integration systems vary from warehousing to virtual integration 
that leave the data at the sources and access it at query time (Doan, Halevy, & Ives, 2012). To address the 
limitations of top-down approach with one global mediated schema, Peer-to-peer (P2P) (Ng, Ooi, Tan, & 
Zhou, 2003; Halevy et al., 2004; Wang, Rabsch, Kling, Liu, & Pearson, 2007) and Collaborative Data 
Sharing Systems (CDSS) (Green et al., 2007; Talukdar, Ives, & Pereira, 2010) have been proposed and 
developed. However, the disadvantages of traditional data integration systems, P2P, and CDSS systems 
include long setup and hard to use requiring users to have certain expertise. 


3 Col*Fusion 


Recognizing the problem, we introduce Col*Fusion — a novel architecture for large-scale data integration, 
fusion and preservation based on crowdsourcing. Col*Fusion could be though of as an interdisciplinary data 
repository, however the datasets are not isolated, but connected. In fact, each dataset could be seen as a 
piece of bigger puzzle that describe world from one perspective. With time Col*Fusion connects the pieces 
together to complete the puzzle. We have implemented Col*Fusion as web-based application that provides 
easy-to-use uniform interface for data submission and integration. 

Data submission module allows users to submit data from heterogeneous sources and formats, such 
as Excel, SPSS and CSV files, dump files from MySQL, PostgreSQL and Microsoft SQL databases. In fact, 
the number of file formats as well as file organization can be expanded by Col*Fusion users. We use Pentaho 
Data Integration (Casters, Bouman, & Van Dongen, 2010) (aka Kettle) on the back end for extracting, 
transforming and loading (ETL) data into Col*Fusion repository. Kettle is free and open-sourced, it allows 
users to specify ETL tasks via intuitive, graphical, drag and drop design environment and save it as a 
transformation file. Kettle support large number of data sources including leading Hadoop distributions, 
NoSQL databases, and other big data stores. Col*Fusion users can create a custom Kettle transformation 
and submit into the Col*Fusion. Some formats are more common than others. Col*Fusion users can share 
Kettle transformations with other users to handle particular file organization. Therefore most users do not 
need to do a lot of preparatory work to submit their datasets into Col*Fusion thus makes it easier to use. 

Once dataset is submitted, Col*Fusion mine relationships automatically. You can think of 
Col*Fusion relationships as foreign keys in relational data model. Currently automatic relationship mining 
algorithm establishes a relationship between two datasets Dı and Də if they share common variables. 
Relationships can also be added manually by users if Col*Fusion cannot find them automatically due to 
distinct variable names or if relationship involve mapping of several variables to one (e.g. Dı might have a 
date split into three variables, whereas D might have it as one variable). 

Each relationship has name, description, and average confidence associated with it (Fig. 1). 
Confidence values for relationships are provided by Col*Fusion users and basically reflect their believe that 
relationships hold. Relationships consist of links — the actual connections between variables in datasets. 
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Each link has two numerical values associated with it that are automatically calculated. The values reflect 
data overlapping ratios on the both ends of a link. For example, both Dı and D2 might have “State” variables 
that denote political entity forming part of USA. However Dı might have full names of state whereas Də 
might have state name abbreviations. Without knowledge of mapping between full state names and 
abbreviates, data overlapping ratios would be 0 and merging Dı and Də would yield an empty dataset. 


Name: autogenerated Description: based on column name Average Confidence: 1 


| 1.00 
“COUNTRY PRECIPITATION TTT eee State De eae Population 

1807 1 UNITED STATES OF AMERICA 1092 South Carolina AL 1900 1830 
1807 2 UNITED STATES OF AMERICA 660 South Carolina 

1807 3 UNITED STATES OF AMERICA 178 South Carolina AL 1901 1907 
1807 4 UNITED STATES OF AMERICA 483 South Carolina AL 1902 1935 
1807 5 UNITED STATES OF AMERICA 940 South Carolina AL 1903 1957 
1807 6 UNITED STATES OF AMERICA 991 South Caroline AL 1904 1978 
1807 Hd UNITED STATES OF AMERICA 940 South Coina AL 1905 2012 
1807 a UNITED STATES OF AMERICA 1422 South Carolina 

1807 9 UNITED STATES OF AMERICA 1143 South Carolina AL 1906 2045 

MIRAA 2 ———— - Jee ee. ee . 


Figure 1: Automatically discovered Col*Fusion relationship between two datasets 


Relationship’s links can involve data transformation assigned by users (or automatically). One type of 
transformation is synonyms. Synonyms are used to specify mapping between variables on a value basis (Fig. 
2). One of the advantages of this type of transformation is that it is possible to specify many to many 
mapping. Transformation can also be specified as a transformation function that is applied to each value of 
a variable. For example, date conversion from DD-MM-YYYY to MM-DD-YYYY format. 


South Carolina 


Florida FL 


— 
ee 


YEAR MONTH COUNTRY PRECIPITATION STATE Year Population 

1807 1 UNITED STATES OF AMERICA 1092 South Carolina 1900 1830 
1807 2 UNITED STATES OF AMERICA 660 South Carolina 

1807 3 UNITED STATES OF AMERICA 178 South Carolina AL 1901 1907 
1807 4 UNITED STATES OF AMERICA 483 South Carolina AL 1902 1935 
1807 5 UNITED STATES OF AMERICA 340 South Carolina AL 1903 1957 
1807 6 UNITED STATES OF AMERICA 991 South Carolina AL 1904 1978 
1807 7 UNITED STATES OF AMERICA 940 South Caroline 

1807 8 UNITED STATES OF AMERICA 1422 South Carolina AL 1905 2012 
1807 3 UNITED STATES OF AMERICA 1143 South Carolina AL 1906 2045 

= Sa r S Se ———— N U —-_ 


Figure 2: Example of synonyms transformation 


One disadvantage of synonyms transformation appears when there is a large number of distinct values that 
are not matched; especially if those mapping are well know and might be available (e.g., US state mapping 
is well know). Col*Fusion can deal with such situations by traversing relationships graph (Fig. 3). For 
example, let D3 be a datasets that have US state mappings (D3 might be a results of research which is 
completely independent from Dı and D» and submitted by other users). Then D; and D» can be merged via 
D3. In general two datasets might be related to each other via several other datasets. 
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_ ne synonyms defined 
0.33 | 0.00 | | | 0.00 | 1.00 | 

Yi MONTH COUNTRY PRECIPITATION e State i Year Population 
1807 1 UNITED STATES OF AMERICA 1092 South Caroline AL 1900 1830 

1807 2 UNITED STATES OF AMERICA 660 South Caroline 

1807 3 UNITED STATES OF AMERICA 178 South Carolina AL 1901 1907 

1807 4 UNITED STATES OF AMERICA 483 South Carolina AL 1902 1935 

1807 5 UNITED STATES OF AMERICA 340 South Carolina AL 1903 1957 

1807 6 UNITED STATES OF AMERICA 991 South Carolina AL 1904 1978 

1807 T UNITED STATES OF AMERICA 940 South Carolina 

1807 8 UNITED STATES OF AMERICA 1422 South Carolina AL 1905 2012 

1807 9 UNITED STATES OF AMERICA 1143 South Carolina AL 1906 2045 

ee) ee ee ete Mo A = ag? ee 
—_— a = 


AMERICAN SAMOA 
ARIZONA 
ARKANSAS 
CALIFORNIA 
COLORADO 
CONNECTICUT 
DELAWARE 


—— 


PANET) 


Figure 3: Example of relationship graph traversal to merge datasets 


Relationships can be used by users to see how their datasets are related to other datasets and “move” from 
one dataset to another, but also relationships are used when users perform search in Col*Fusion. Col*Fusion 
maintains schema graph in which vertices represent data tables and edges represent relationships. When 
user posts a keyword query, the system performs three steps to answer it. First, Col*Fusion finds all datasets 
(vertices in the graph) that contain the keywords. Second, it traverses the schema graph to find all paths 
between the set of vertices found in step one. Third, it translates each path to an SQL query by mapping 
every vertex to a dataset and every edge to SQL join operator. Therefore, the result of the search is not 
just a list of datasets that have variables user is interested in, but rather a merged dataset or a list of 
merged datasets if there are several possible paths to perform the merge. The list is ranked based on 
relationships’ confidence and data overlapping values (e.g., the rank is higher for those paths which have 
higher average confidence and data overlapping values). 

Col*Fusion provides provenance information in OPM format (Moreau et al., 2011) for merged 
datasets, so users know where each variable came from. In addition, merged datasets can be visualized, 
downloaded in several formats (regardless of original format) and shared with other users. 


4 Conclusion 

While the Col*Fusion involves some labor from users, it addresses the dataset isolation problem that has 
long been resistant to resolution. While datasets can be connected through analysis of the file-level metadata 
on their sources and overall characteristics, there has not previously been an application that connects the 
variable-level metadata within datasets. Employing variable-level relationships between datasets allows a 
third party (users who are not the experts in that area) to reuse the data to cross boundaries, build scientific 
knowledge and thus advance science. 
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Abstract 

Project WHIPPET funded by the European Association for Health Information and Libraries (EAHIL) 
aims to understand the diversity of information roles in the health sector. A pilot survey was distributed 
at the EAHIL workshop, Stockholm (2013). Ninety-eight questionnaires were distributed and 47 
completed responses were received (48% response rate). The results demonstrate the continued use of the 
terms ‘library’ and ‘librarian’. Key roles are teaching and training, literature searching, and management. 
A wide range of skills and attributes are needed to carry out these roles. Soft skills were mentioned most 
frequently, followed by LIS skills, management, and IT skills. Skills development needs were identified, 
with IT and new technologies cited most frequently, followed by management and pedagogical skills. 
Issues relating to budgets and finance were identified as a major challenge. Other challenges included 
staff issues, new technologies, keeping up-to-date and promoting services. Impact is primarily through 
teaching, research support and effective service management. The issues will be explored through a wider 
survey and analysis of focus groups and interviews. The findings will support future planning, training 
and development within the profession. 
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1 Introduction 


Project WHIPPET! aims to record the experiences of health information professionals currently practising 
across Europe. The project is investigating the diversity of roles that exist in the health information sector; 
the skills that health information professionals have and need; and the critical nature of those roles and 
skills in supporting effective healthcare in a rapidly changing environment (Brettle & Urquhart, 2012). 


2 Pilot questionnaire 


A pilot survey was distributed at the EAHIL workshop in Stockholm, Sweden during June 2013, prior to a 
full survey being distributed via EAHIL and other mailing lists. Ninety-eight questionnaires were distributed 
and 47 completed responses were received, giving a response rate of 48%. The survey questions are provided 
in Appendix 1. The majority of questions were open-ended producing mainly qualitative data. The data 
were then coded; coding frequencies were calculated, enabling descriptive statistics to be produced. 
Thirty-four respondents (72%) were female and the majority, (62%) were between 35 and 54. 
Respondents came from a wide range of European countries, with three respondents coming from further 
afield (the Caribbean, US and United Arab Emirates). Most respondents came from the education sector 
or state healthcare, or a combination of the two (87%). The remaining individuals worked in industry, 


independent organisations or government-funded bodies. 


1 http://projectwhippet.shef.ac.uk 
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Figure 1: Job roles 
3 Results 
3.1 Job titles 


Thirty-three respondents (70%) had job titles that included the words ‘Librarian’, or ‘librarian’, while five 
respondents (11%) had ‘information’ in their title. Twenty-one respondents (45%) had a title implying a 
leadership/management role and 4 (9%) had a title which implied an educational role. 


3.2 Job roles (Figure 1) 
The most frequently-mentioned role was teaching or training, with 29 respondents (62%) identifying this 
as a key role. Within this, 17 (36%) mentioned information literacy training specifically. Other types of 
training mentioned included data management training (1), critical appraisal training (1) and evidence- 
based medicine / research training (1). 

Literature searching was identified as a key role by 22 respondents (47%). Roles related to collection 
development and management were identified by 16 (34%). This covered a range of roles including 
purchasing resources, negotiating licenses, and providing access to the resources. 
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Management roles such as strategic management, budget management, staff/competence 
management and quality management were mentioned by 15 respondents (32%). 


3.3. Key skills (Figure 2) 


Soft skills were seen as greatly in demand, with communication skills mentioned most frequently by 16 
respondents (34%) and generic people skills mentioned by 11 (23%). Pedagogical skills were also viewed as 
highly important, mentioned by 13 (28%). LIS-specific skills were not identified so frequently, with 
information literacy and/or search skills mentioned by 10 respondents (21%) and knowledge of sources 
mentioned by eight (17%). 

Management skills were mentioned by 10 respondents (21%). Some specific areas of management 
were identified such as people management (3, 6%); change management (2, 4%); financial management (2, 
4%); and strategic management (1). 

IT and technical skills were mentioned by 9 respondents (19%) and research skills by 7 (15%). 
Research skills were employed in a variety of different contexts. One respondent carried out his own 
research, while others specified that they used research skills to carry out surveys or performance 
management. Others found knowledge of research methodology useful in assessing evidence. 

The most frequently-mentioned personal attribute was willingness to learn or intellectual curiosity, 
mentioned by 8 respondents (17%). Comments relating to the changing skill-set, and to the perception that 
(some) information professionals do ‘everything’, recurred in the data. 

Other than willingness to learn, there seemed to be relatively little consensus about the personal 
attributes required for the job. This may reflect the wide range of roles carried out by respondents. 
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Figure 2: Key skills 


3.4 Skills that need developing (Figure 3) 


IT and new technologies were mentioned most frequently as skills that need developing, by 15 respondents 
(31%) suggesting that they are perceived as a growth area for the future. Within this category, some specific 
areas were mentioned, such as m-libraries (3 respondents, 6%) and e-learning (3 respondents, 6%). 
Management skills were mentioned as an area for development by 7 respondents, (15%). Within this 
category, the most frequently mentioned area was leadership skills (5 respondents, 10%), followed by 
strategic management skills (4 respondents, 8%) and budget management skills (3 respondents, 6%). Other 
skills needs identified were pedagogical skills (6 respondents, 13%), and research skills, 5 respondents (10%). 

Aside from the areas discussed above, there seemed to be relatively little consensus on the skills 
that need developing: a large number of areas were each identified by a minority of respondents. This may 
reflect the range of roles carried out by health information professionals. 

It is also notable that 13 respondents (28%) made reference to developing existing skills and/or 
keeping up-to-date. Comments included, “Always looking to hone existing skills” and “In different trends 
like research data - there is always something to develop!” This suggests that in many cases, respondents 
are building on knowledge rather than acquiring skills from scratch. 
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Figure 3: Skills that need developing 


3.5 Challenges (Figure 4) 


The overwhelming challenge identified by respondents was budgets and finance with 20 respondents (43%) 
identifying this as a challenge (14 budget, 5 paymasters, 1 budgeting skills). Other management challenges 
identified included staff and staff skills with 14 respondents (30%) commenting on these issues. Another 
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key challenge is around promoting the service (11, 23%), and meeting user needs (9, 19%), with users 
sometimes not understanding the relevance of the service (6, 13%). In a fast-moving profession, keeping up- 
to-date presented respondents with problems with 12 (26%) identifying this as a challenge and 13 (28%) 
mentioning new technologies specifically (13, 28%). This tallies with the findings presented in section 3.4, 
which showed that IT and new technologies constituted the most frequently mentioned area where skills 
development was needed. The challenges are perhaps accentuated when set against the frequently-identified 
problem of time and heavy workload (8, 17%). 
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Figure 4: Challenges faced 


3.6 Making an impact 


Despite the challenges, 42 of the respondents were able to give an example of a time when they felt they 
had made an impact in their job. These fell into six categories (Figure 5) largely related to their key roles 
of teaching, supporting research, supporting users and the management of their services. They did not find 
it easy to identify direct impact on healthcare, with only one respondent acknowledging this. However, 
much of what they do supports the research, knowledge, learning, and effectiveness within the healthcare 
organizations employing them and is likely to have an indirect role on more effective healthcare. 
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Figure 5: Ways in which health information professionals make an impact 


4 Limitations 


The pilot survey sample was small, and it is therefore not possible to make statistical generalizations from 
the results. Moreover, the survey was distributed at the EAHIL conference, so respondents may not be 
typical of all health information professionals. Project WHIPPET focuses primarily on the European context 
and the results may therefore not be transferable to other geographical contexts. 

The main survey endeavours to address some of these limitations. A link to the online survey was 
distributed via the EAHIL mailing list and national mailing lists aimed at (health) library and information 
professionals. 512 usable responses have been received. In addition, analysis of focus group and interview 
transcripts will provide more in-depth data on health information professionals’ roles, skills and career 
paths, and on the contributions they make to effective healthcare. 


5 Conclusion 


The preliminary results demonstrate the continued use and relevance of the terms ‘library’ and ‘librarian’ 
in the health information sector. Other key terms used reflected leadership and educational roles held by 
those surveyed. 

Key roles included teaching and training, literature searching, and management roles. Health 
information and library professionals require a wide range of skills and attributes to enable them to carry 
out these roles. These can be categorized into soft skills, LIS-specific skills, management skills, IT and 
technical skills, and personal attributes. 

Throughout Europe, countries have identified the need for increased skills and professional 
standards to meet changing demands on health information professionals (Robu & Bakker, 2010; 
Tsalapatani & Kalogeraki, 2010). The findings of the present study contribute to this growing body of 
knowledge by identifying areas in which health information professionals felt their skills needed developing. 
IT and new technologies constituted the most frequently-mentioned area, other areas identified being 
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management skills, pedagogical skills and research skills. Comments suggested a commitment to lifelong 
learning among respondents. 

Despite facing a number of challenges — most significantly budgetary issues — respondents were able 
to identify various areas in which they felt they made an impact. This was primarily through supporting 
roles including research support, teaching and training, and through effective management of the services 
to support their users. Thus, the early indications from this study support the findings of Harrison, Creaser, 
and Greenwood (2011) in the Irish context: health information professionals are applying their specialized 
skill-set to add value and benefit across the health sector. 
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8 Appendix 1: Survey questions 
1. What is your job title? 


2. What type of organisation do you work for? 
o State healthcare 


o Private healthcare 
o Charity / voluntary sector 
o Education sector (e.g. university, college) 
o Industry (e.g. pharmaceutical company) 
o Other (please specify) 
3. What country do you work in? 
4. Who are your main user groups? (e.g. clinicians, undergraduates, general public, etc.) 


5. Please briefly summarise the key elements of your role (e.g. literature searching, information literacy 
training, outreach, etc.) 
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6. What are the main challenges you face in your role? 
7. What key skills are needed to do your job? 


8. How did you acquire these skills? Please tick all that apply 


Within the workplace 


University degree in library/information studies 


University degree in other subject (please specify) 


Short courses 


Mentoring 


Shadowing 


Professional networks 


Online learning 


Other (please specify) 


9. Are there any areas in which you need to develop your skills? 


10. Please give an example of a time when you felt you made a real impact in your job 


11. As part of the project, we are developing a website to support information and knowledge sharing 
between health information professionals. What sort of resources would you like to see on the website? 


12. What is your age? 


o 618 to 24 
o 25 to 34 
o 35 to 44 
o 45 to 54 
o 55 to 64 


o 65 or older 
o Rather not say 
13. What is your gender? 
o Female 
o Male 
o Rather not say 
14. Do you have any comments about the design of this questionnaire? 


15. Would you be interested in participating in later stages of the research? Please tick one or more boxes 


I would potentially be willing to take part in a face-to-face interview in the UK 


I would potentially be willing to take part in an interview by Skype or instant messenger 


I do not wish to participate further, but would like to be informed of the results 


If you have ticked any of the boxes above, please provide your email address 
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Abstract 

For tenure-track faculty, mentoring can be an important source of information needed for success in their 
new career and institution. Although information behavior is central to the mentoring relationship, 
mentoring has not yet been examined through an information behavior lens. This study sought to fill 
this gap by investigating mentees’ perceptions regarding how they and their mentors share information, 
what motivates them to seek information, what barriers exist to their information seeking, and what they 
believe contributes to a successful mentoring relationship. Data were collected using a Web survey and 
follow-up interviews, both of which explored the mentoring experiences of tenure-track faculty at a major 
mid-Atlantic research university. Study findings suggest that the information seeking of mentees is akin 
to browsing in a document collection, that mentees’ information needs are fluid and highly contextualized, 
and that there are affective barriers to information seeking within the context of the mentoring 
relationship. 
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1 Introduction 


Ironically, as mentoring programs become more popular on university campuses, not much attention is paid 
to what makes these programs most effective (Allen, Eby, & Lentz, 2006). Ideally, mentoring is the process 
of transferring cultural information about an organization. The mentor has knowledge of department politics 
and advice about how to reach goals that will accomplish the work and satisfy the tenure review committee 
(Palgi & Moore, 2004). Despite the central importance of this process of information transfer, however, 
mentoring has never been studied from the perspective of information behavior. Through the lens of 
information behavior theory, particularly as it deals with the affective qualities of information seeking, one 
may see that there are often barriers to information transfer between the mentor and the mentee. 

As an explicit professional development program, mentoring suffers from informality. Department 
administrators are reluctant to impose ideas of how a mentoring program should work, particularly as it is 
commonly believed that mentoring relationships should develop naturally, without administrative influence 
(Zellers, Howard, & Barcic, 2008). Faculty who are less comfortable forming interpersonal bonds may be 
less likely to reach out to the faculty they have been assigned to mentor, and their reluctance may be 
exacerbated if the faculty member is of a different race or gender (Stanley & Lincoln, 2005). Similarly, 
junior faculty who arrive on campus with little to no social network may find a host of reasons not to 
‘bother’ their mentor (Blickle, Schneider, Meurs, & Perrewé, 2010). 

It is the faculty now entering the system who will need to find answers to the questions that 
confront American universities. However, without mentoring, the pool of junior faculty who will be in a 
position to develop the new academy is likely to be much diminished. Mentoring gives junior faculty the 
support they need to make the transition from graduate school or post-doctoral training to a tenured faculty 
position. Through mentoring, faculty are much more likely to reach their full potential (Allen et al., 2006; 
Berk, Berg, Mortimer, Walton-Moss, & Yeo, 2005; Blickle et al., 2010). 
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The study presented in this poster examined 
the perceptions of mentees from the perspective of 
information seeking behavior and barriers to 
information seeking. Based on the findings of this study, 
I offer recommendations that may assist faculty 
members in overcoming such barriers, as well as to help 


Informal 
Mentor 


administrators in identifying mentoring best practices. 


1.1 Literature Review 


Surveyed research about mentoring falls into three main 
categories: studies of how the mentor may affect the Figure 1: Barriers to Information Seeking 
mentee (Blackburn, Chapman, & Cameron, 1981; Palgi 

& Moore, 2004; Ragins, 1997; Sugimoto, 2012); studies quantifying the characteristics of a specific program 
(Allen et al., 2006; Blickle et al., 2010; Thurston, Navarrete, & Miller, 2009), or the ideal program (Carey 
& Weissman, 2010; Hansman, 2003); and finally commentary pieces about how to choose a mentor (Ensher 
& Murphy, 2006; Hansman, 2003) and what junior faculty need from their mentors (Leslie, Lingard, & 
Whyte, 2005). 

Information behavior is at the heart of mentoring. Kuhlthau (2004) describes the value of an 
“invitational” mood in information seeking, in the sense that one is simply open to new ideas, and she 
contrasts this with the “indicative” mood, which leads one to conclusive actions. Under the constraining 
sense of the value of a mentor’s time, however, the mentee may never have the freedom to enter the 
“invitational” mood. As Taylor (1968) noted, describing what you don’t know to someone you don’t know 
all that well is a very complex act of communication. 

Junior faculty may also be constrained because they inhabit a culture which prizes organized 
thought. However, according to Bates’ (1989) berrypicking model, the search for information is a query that 
changes and evolves during the course of searching (Bates, 1989; Taylor, 1968). For a mentee in conversation 
with his or her mentor, the berrypicking model would suggest the freedom to change the subject, to follow 
up on a chance remark, or to make conceptual connections of dubious logic. Models like berrypicking and 
information patches (Pirolli & Card, 1999) acknowledge the contextual nature of the information need, 
and the way that need evolves over time. These models can inform the mentee-mentor relationship in 
fruitful ways, by creating space where the transfer of cultural information can take place. 


1.2 Research Questions 


This study sought answers to the following research questions from the perspective of mentees at a major 
mid-Atlantic University: 


RQ 1. How do people share information within the context of their mentoring relationship? 

RQ 2. What motivates people to look for information within the context of the mentoring relationship? 
RQ 3. What are the barriers to information seeking within the context of the mentoring relationship? 
RQ 4. What makes for a successful mentoring relationship? 


1.3 Methodology 

In this poster, I report results from a mixed-method study of the tenure-track faculty at a mid-Atlantic 
research university, within which faculty perceptions of the mentoring they have received is examined from 
the perspective of information behavior. Every tenure-track faculty member in the University was invited 
to complete an on-line survey (response rate 28%, n=102), and following completion of the survey, faculty 
were invited to be interviewed (29% of survey respondents volunteered, n=9). Survey responses were 
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analyzed using quantitative techniques, while transcripts from the interviews were analyzed using 


qualitative techniques. 


1.4 


Mentors and mentees share information using tools like email, but there is also an emphasis on meeting face 


Preliminary Findings 


to face. Mentees are motivated to seek information from their mentors because they recognize in themselves 
a knowledge gap. Particularly emphasized was the idea of learning from the experience of others; mentors 
were described as experienced in navigating the institution. The extent to which the mentor is perceived as 
too busy can be a barrier to information seeking, as can other elements of the mentoring relationship. 
Meanwhile, the successful mentoring relationship is a product of many things, but perhaps most important 

is the mentor’s personality and the common experiences that he or she may share with the mentee. 
The findings represented in my poster are 
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mentoring for junior faculty members, and the kinds of 
information they seek from their mentors. In the context 


engaging ag wise A humiie challenging 
she lp f Gil patient. 
oer ful ‘ant PU a 
e Om riven 
See Howledgeabl 
investment © smart \ A T 
„carin, KNO Pae ATES BER a 
rm rei listi ice, 
ana ble tive tive 
uctive 
ortiv: direct 
pan wae eSt 


close positive 
eS haan 
confidentiale 


consideration 


“proactive 


anjait, 


practical f 


sfair 
constant 


of this study, the mentor is a repository for information 
of many kinds, including career and psychosocial 
dimensions (Kram, 1985). The mentee is constrained by 
the bounds of his or her small world, to the extent that 
little is known of research or projects outside the 
department. Because of the exigencies of teaching, 
recruiting graduate students, applying for grants, and 
developing research programs, the mentee has little 
freedom to look for information on how to accomplish 
all these tasks; the mentor must serve as the library 
shelf. 


The findings from this study demonstrate that 


information transfer between mentor and mentee is 
Figure 2: Adjectives describing a successful vastly improved when there is a positive relationship 


mentoring relationship from the mentee between the two. In order to develop that relationship, 


perspective. a certain amount of time must be committed to the 
mentoring process — and much of that time must be 
spent in face to face meetings. Mentor and mentee need not be friends, but they must be comfortable 


acquaintances in order to freely transfer information. 


2 Future Work and Conclusion 

In the immediate future, additional research is planned to survey the professors who serve as mentors within 
the same University. The present study evaluates the mentoring relationship only from the perspective of 
the mentee. However, mentors are also likely to have information needs and perhaps to encounter barriers 
in their information seeking and in their information sharing. 
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Abstract 

Presentation of the design of a research project in its initial stages. The project Green Search investigates 
shaping of environmental information, including information on problems and proposed solutions, 
through their representation in search engine results, in social media tools, and in mobile applications 
dedicated to environmentally friendly living and consumption and how this is experienced by people 
using these tools. The project is situated in a socio-technical framework, which sees technology and 
society as mutually dependent and co-constructed. The following four research questions, organized in 
two interlinked parts, guide the study: Part I: Configurations - How are specific environmental issues 
with bearing on everyday life practices configured through web search and recommendation services and 
in mobile applications facilitating environmentally friendly living? In which ways do users judge mediated 
personal recommendations (through social media), search engine results and information from dedicated 
mobile applications for environmentally friendly living? Part II: Trust - How is trust attributed to the 
information retrieved/received on environmental issues with bearing on everyday life practices, 
specifically considering how different sources are seen to relate to each other? Which interests, 
organizations or link relations are perceived as trustworthy and how is this motivated? This is 
investigated in relation to two thematic areas: food and the home. The presented project uses a mixed 
method approach, with qualitative methods (focus group interviews) being supplemented with 
quantitative elements (web analyses). 
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1 Background 

Design of a research project in its initial stages. 
Environmental information in online environments is an ever more important arena for information 
on environmental issues, which ranges from reports on environmental destruction to advice on how to 
address it on institutional levels and in individuals’ everyday life. The project Green Search investigates 
the shaping of environmental information, including information on problems and proposed solutions, 
through their representation in search engine results, in social media tools, and in mobile applications 
dedicated to environmentally friendly living and consumption and how this is experienced by people using 
these tools. 

The research project links together three problem areas of relevance for environmental information 
online: 


- Firstly, environmental problems tend to be controversial and are given meaning in different ways 
depending on interests, allegiance and context of who presents them (Carvalho, 2007). 

- Secondly, the role of information for alleviating environmental problems, specifically on an 
individual level, is unclear. Specifically studies on everyday life practices have shown that what 
people report to know about environmental problems on an abstract level is often disconnected 
from their practices in everyday life (cf. Bartiaux 2008; Haider, 2011; Hobson, 2003; Shove 2005) 
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and even strong values do not necessarily directly translate into practices in all cases (Nathan, 
2012). 

- Thirdly, search engines and mobile applications, work ever more with different types of 
personalization, including location awareness, to increase relevance and utility (cf. Halavais, 2009; 
Feuz et al., 2011; Hannak et al. 2013). Most social media through their profile-centered approach 
have personalization as their starting point (boyd & Ellison, 2007; Ellison & boyd, 2013). 


The study’s interest lies at the intersection of these three problem areas. This is of relevance for the shaping 
of environmental information in online environments, as it emerges in search engine results and social media 
feeds, specifically of information with bearing on everyday life practices. 

Environmental information online stems from very diverse sources. It includes for instance official 
campaigns, social marketing and promotional activities by interest organizations and businesses, media 
reports, reports by lobby organizations or political parties, and not least exchange of personal experiences 
and opinions in forums or social media. All these disparate sources are united in the search engine result 
page or are blended with other content in a social media feed when representing an issue of relevance for 
the environment. Together and with the information spaces they all bring in through their sites’ in- and 
out-links they shape the topic at hand. Furthermore, due to mechanisms of personalization and localization, 
results and ranking order are not identical between different searchers. In the case of social media, including 
social network sites, blogs and micro blogs, how an environmental problem appears, including suggestions 
for how to address it, varies potentially even more from user to user. This, i.e. the way in which different 
sources are blended as well as personalized and filtered, has bearing on how certain issues are perceived and 
which ways for addressing them seem motivated and feasible. In addition, people’s different life and work 
situations contribute to how an issue is interpreted and made sense of. Specifically, questions of which 
sources to trust and how to judge trustworthiness come to the fore. 


2 Theoretical framework 


The project is situated in a socio-technical framework, which sees technology and society as mutually 
dependent and co-constructed (cf. van House, 2004; Suchman 2007). Accordingly, the study’s interest lies 
with how search engines and other tools for retrieving/receiving information shape environmental issues 
and how people perceive this and make sense of in their everyday life and everyday practices. In line with 
the theoretical framework, the focus is on the very intersection of technical tools, people and the 
environment as a problem space and as a doing space. 

For example, Google is not just a neutral tool for information retrieval, but since its algorithms 
control most online visibility it structures how we see issues and contributes to shaping what we regard as 
important. Ultimately how certain issues are presented in and by Google is an important part of the very 
issues at stake (Eklöf & Mager, 2012) and of how they are perceived. Similarly, social media like Facebook, 
blogs or Twitter and even dedicated mobile applications afford certain ways of using them while various 
interests, commercial or otherwise, contribute to how issues are represented through them. 


3 Research questions 
The following four research questions, organized in two interlinked parts, guide the study: 
Part I: Configurations 
e How are specific environmental issues with bearing on everyday life practices configured through 


web search and recommendation services and in mobile applications facilitating environmentally 
friendly living? 
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e In which ways do users judge mediated personal recommendations (through social media), search 
engine results and information from dedicated mobile applications for environmentally friendly 
living? 

Part I: Trust 

e How is trust attributed to the information retrieved/received on environmental issues with bearing 
on everyday life practices, specifically considering how different sources are seen to relate to each 
other? 

e Which interests, organizations or link relations are perceived as trustworthy and how is this 
motivated? 


4 Thematic areas & empirical delimitations 


The project is organized around two empirically focused areas. Both take up thematic areas, which are 
central for the discursive and socio-material organization of environmentally friendly living in Western 
consumer societies as a site of contestation (Humphery 2011; Lewis & Potter, 2012) and where the role of 
information is potentially significant. 


e Food 
e Home/housing 


These areas are indicative of the increasingly individualized character that has come to structure 
environmentally friendly living as a site of engagement. At the same time, they are noticeably situated 
within a larger problem space involving socio-economic structures of production and consumption, including 
use of resources and energy. Furthermore, both thematic areas, food and the home, are connected to various 
kinds of materials and material practices, which are relevant for environmentally friendly everyday living 
(cf. Hobson 2006; Shove 2005). Taken together, this makes these areas particularly suitable for studying 
aspects of information on environmental issues with bearing on everyday life practices. 


5 Method 


The project uses a mixed method approach, with qualitative methods being supplemented with quantitative 
elements. 


5.1 Qualitative focus groups interviews: 


Interviews are carried out with four focus groups consisting of 6 to 8 informants each. The small groups 
size is motivated by the fact that the group members are supposed to use digital technology and reflect on 
their use. The focus groups are selected to represent a young to middle age (25-40 years) population of both 
sexes in Sweden; i.e. people born in the 1970s and 1980s who grew up in today’s consumer society with 
increased consumption levels as well as environmental awareness becoming common. Sweden is a suitable 
place for this study since it is country where IT literacy and Internet penetration are high (Findahl, 2012) 
at the same time as a certain environmental discourse is considered mainstream (Isenhour, 2013.). 

Settings where people meet in already existing small groups independent of the research at hand 
are used as focus groups: (1) a parenting group, (2) a book club, (3) a discussion group of master level 
university students, (4) a group of expats in Sweden meeting through a social networking platform. By 
using already existing groups as focus groups a conducive atmosphere can more easily ensue; furthermore 
this suits the purpose of making visible socially structured interpretative frameworks (Jenkins et al. 2010), 
which is an advantage of using focus group based research. 

In order to provide a starting point, akin to a talking stimulus in vignette-based interviewing 
(Jenkins et al. 2010), the informants are presented with a real-life blog posts and a mainstream media report 
(Salazar & Orobitg, 2012) available online on (a) the role of food for environmentally friendly living and 
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(b) the practicalities of low energy housing. The informants are asked to use their mobile phones or other 
devices (e.g. laptop computers, ipad etc.), which they are encouraged to bring with them prior to the 
meeting to support their discussions. They are asked to comment their searches and search results and also 
compare those with each other’s. They are also encouraged to draw on their own personal social networks. 
They are furthermore presented with a list of relevant mobile phone applications to install and test if they 
wish. 

Each group meets twice. At the second meeting the informants are presented with the results of a 
number of quantitative web analyses carried out on some of the material searched on and discussed in the 
first meeting. These visualizations and underlying lists work then as mediators for the ensuing discussions. 


5.2 Web analysis on search results (Google) 


Web analyses are carried out using digital tools developed and maintained by the Digital Methods Initiative: 
https: //wiki.digitalmethods.net/Dmi/ 

A selection of 5-7 search results from each focus group meeting is saved and cleaned using Digital 
Methods Initiative Harvester to retain cleaned URLs from Google results. These are then analyzed further 
in order to establish which organizations/actors and issues dominate the search results at hand and how 
the information space is structured regarding direct links and co-links. This is achieved by using the 
IssueCrawler tool for establishing and drawing networks of direct links and of co-links in two iterations. It 
is furthermore supplemented by applying the Issue Discovery tool, which produces lists of the most relevant 
terms and phrases in submitted set of URLs. This will be visualized in the form of a tag cloud. 

The results in the form of ordered lists, as well as networks, tag clouds and other visualizations 
have two purposes. Firstly, they are the basis for establishing which actors dominate certain information 
spaces and hence for investigating how specific environmental issues are configured through web search, 
recommendation services and in mobile applications. Secondly, they provide a starting point for follow up 
discussion with the focus groups as outlined above and contribute thus to investigating issues of judgment 


and trustworthiness regarding online environmental information. 
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Abstract 

Numerous studies have examined the role of gender in game design, game play and game experience and 
conclude that women are often excluded and objectified in character design, appearance and behavior. 
Game and gender studies scholars encourage further research in these topics. However, in the analysis 
and critique of these findings, there is little to no emphasis on a plan of implementation or suggestions 
made concerning a change in the approach of stereotypes used in game and character design, sexism in 
game culture and inclusion of women in STEM related fields. This paper provides insights into the 
importance of gender roles and character design and representation in video games in relation to creating 


inclusive gaming environments for women. 
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1 Problem Question 


How can character design and representation in video games be used to create inclusive gaming 


environments for women? 


2 Introduction 


Women make up forty-seven percent of all console, PC, and mobile game players (ESA 2012). Although 
the growing presence of women in gaming culture is encouraging, most top-selling games reinforce gender 
stereotypes and inequalities embedded in our society (Kafai, et al., 2008, p. 11). Female characters are often 
hyper-sexualized through body appearance and revealing clothing in games that “overstress young, buxom 
and beautiful women in their content” (Kafai, et al., 2008, p. 11). In turn, these negative connotations 
discourage women from playing “hardcore” games, a term used by gamers to describe games that require 
significant time and dedication towards successful completion, in addition to diminishing their interest in 
pursuing careers in the male dominated game industry. This can be supported by the #lreasonwhy hash 
tag trending topic on Twitter in fall of 2012, where women in the gaming industry expressed their distaste 
of working in the industry and how issues such as harassment prevented some from working towards having 
higher-level positions within game companies. These implications hold true not only in gaming but also in 
STEM (Science, Technology, Engineering, & Math) related fields (Kafai, et al.,2008). American culture is 
redefining sex roles, making “appropriate sex roles and behaviors ambiguous and elusive” (Tragos 2009, p. 
544). It is important for the gaming industry to respond to these changes by considering female audiences 


during game design, production, and marketing. 
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3 Methods 


Because the visual portrayal of character avatars is a central component of the gaming experience, we 
decided to create a visualization of character body types in popular console games. Focusing our research 
pool on hardcore games, we compiled an image that outlines the average body form and appearance of male 
and female video game characters. We chose twenty-eight characters for each sex, overlapping twenty-eight 
separate images on top of each other with a twenty-three percent opacity effect in Adobe Photoshop. The 
result of this “character stack” we have created supports the statistical data presented in Beasley and 
Standley’s 2002 study, data that revealed female characters were more likely to show skin than their male 
counterparts. It thereby confirmed our assumption that the design of game characters has not evolved (by 
continuing to have unrealistic body proportions) since the study was completed over ten years ago. The 
resulting silhouette created over the images was used as a template to create sketch renderings of the 
“average” female and male character, which reveal a contrast between the idealized and hyper-sexualized 
character design of game characters versus a realistic and non-sexualized representation. 

We also compared the evolution of character design in Mortal Kombat and Halo. By choosing a 
female and male character in each game, we analyzed the trajectory of avatar design starting with the first 
original design to the latest and most recent avatar design. We noticed that with advances in graphics 
rendering technology came a desire for anatomical realism, attention to detail, and further body 
exaggeration. The female characters have become more slender with bigger breasts and wear less clothing 


while the male characters have become more muscular, aggressive, and powerful in demeanor. 


4 Discussion & Analysis 


Our literature reviews show that female roles in game storylines are most often of secondary nature (Miller 
and Summers 2007; Behm-Morawitz and Mastro 2009; Dill and Thill 2007). Female characters are commonly 
underdeveloped in personality, or not developed at all. Conversely, male characters often possess complex 
and developed personalities that reinforce archetypes of strength, strong will, and independence, making 
them most often the primary and leading characters. These in-game disparities can be viewed as a reflection 
of women’s perceived traits, roles, and importance in non-gaming environments. Moreover, when combined 
with hyper-sexualized character designs, underdeveloped character personalities and plot-lines reinforce the 
notion that women's roles are inconsequential and closely associated with sex. 

In order to create an inclusive experience for women, it is imperative that game companies design 
and market games with female target audiences in mind. Over the past several decades, there has been little 
change in the way female characters are dressed (Glaubke, et al. 2001; Behm-Morawitz and Mastro 2009; 
Dill and Thill 2007; Williams, et al. 2009). Our analysis of character designs supports the conclusions of the 
current literature, stressing that female character representations and roles are often overly sexualized and 
trivialized. Furthermore, character representations have not kept pace over time with evolving social 
standards related to gender equality, female participation, and inclusion. Video game character designs 
continue to play into and reinforce negative stereotypes of women. Lastly, we created a template sketch of 
what “average” male and female characters might look like based upon more common and realistic body 
proportions. Further research in this area could test the effectiveness of such character designs and 
proportions on creating inclusive gaming environments for women by utilizing focus groups and gaming 
prototypes. 


5 Conclusion 


Character designs and representations are critical components of overall game design and should be 
considered when attempting to create an inclusive gaming environment for women. Our project built on 
existing research by identifying and highlighting the portions of female characters that are most often 
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exposed in popular contemporary video games. A visualization of character body types was created using 
twenty-eight character images. When compared to their male counterparts, female characters were often 
hyper-sexualized in appearance, body proportion, and lack of clothing. A template character sketch of what 
an “average” male and female character might look like was created that contrasts the norm of unrealistic 
female and male character designs. What the future holds for further research of this project would include 
the need to test and analyze whether such “average” character designs that we have analyzed would assist 


promoting an inclusive and welcoming gaming environment for women. 
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Abstract 

This poster investigates if ready-to-use bibliometric indicators can be used by individual scholars to 
enrich their curriculum vitae. Selected indicators were tested in four different fields and across 5 different 
academic seniorities. The results show performance in bibliometric evaluation is highly individual and 
using indicators as “benchmarks” unwise. Further the simple calculation of cites per publication per 
years-since-first-publication is a more informative indicator than the ready-to-use ones and can also be 
used to estimate if it is at all worth the scholar’s time to apply indicators to their CV. 
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1 Introduction 


As bibliometric techniques have become readily available and easier to apply at the nano-level they have 
become increasingly used as both self-evaluation and third party evaluations (Wouters et al 2013). The 
ACUMEN’ collaboration is investigating the challenges this increased use has on the correct application of 
bibliometric indicators on a small amount of data. The term “application” encompasses the correct 
interpretation of these statistics on the individual level, the use of indicators as “benchmarks” of scholarly 
performance, and the conclusions that can be drawn. These challenges are discussed in many bibliometric 
studies, eg., (Glänzel & Wouters 2013, Bach, 2011, Costas et al 2011, Costas et al 2009, Sandström 2009), 
but at the current time it is still unclear which indicators are appropriate for which scholars and in which 
fields. This study examines this gap in knowledge. 

We apply ready-to-use indicators, available through Publish or Perish, and investigate if scholars 
can potentially use these indicators themselves to increase the value of, i.e. enrich, the publication 
information on their CVs. CVs were selected because in an evaluation situation they often represent the 
scholar in the form of a document. Aspects to be considered in the analyses of each of the indicators chosen 
for the study are: 


Is the indicator more appropriate in some disciplines than others? 
Is the indicator more appropriate for some seniority than others? 
Is the indicator gender appropriate? 


Pe DS 


Does the indicator produce information that is redundant if used in combination with other 
indicators? 

5. Does indicator produce useful information that scholars can use to enrich their CV? 

6. Does the indicator have a positive or negative effect on the profile of the scholar? 


' For more information about ACUMEN (Academic Careers Understood through Measurements and Norms) see: http://research- 


acumen.eu 
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2 Method 


2.1 Data Collection 


Publication data and ready-to-use bibliometric indicators were obtained for European scholars in the fields 
of Astronomy, Environmental studies, Philosophy and Public Health. Scholars in these fields were identified 
in a questionnaire study of scholarly web-presence undertaken by the University of Wolverhampton in 
December 2011?. 

Of the 2154 scholars who responded, 793 provided a link to an online CV. We collected publication, 
citation data and indicators in Google Scholar via Publish or Perish? from June 13th to July 20th 2013, 
resulting in a sample of 750 researchers with active online CVs. All types of print publications were included 
to account for the different publishing traditions of the fields. Publications were verified using the 
publication lists on the CVs or via a publication list linked to the CV. 


2.2 Dataset 
The dataset consists of a sample of 750 researchers: 584 men and 165 (22%) women, Table 1. 


nPhD nPost Doc ndAssis Prof nAssoc Prof nProf Total 


Astronomy 15 48 26 67 37 193 
Gender M/F 12:3 87:11 20:6 58:9 85:2 162:31 
Environment 3 17 39 85 51 195 
Gender M/F 3:0 11:6 30:9 12:13 44:1 160:35 
Philosophy 9 22 45 75 78 229 
Gender M/F 6:3 20:2 87:8 57:18 68:15 1839:46 
Public Health 9 14 31 50 29 133 
Gender M/F 2:7 1:1 18:13 34:16 19:10 19:53 
Total 36 101 140 277 195 750 
Discipline M/F 23:13 15:26 105:36 221:56 161:84 585:165 


Table 1: Distribution of seniorities and gender across the disciplines in the sample 


2.3 Indicator identification 


The ready-to-use indicators tested in this study are the cumulative indicators of individual performance 
from Publish or Perish*. They were chosen based on selections criteria presented in our previous review of 
114 bibliometric indicators used in individual evaluation (Wildgaard et al, submitted).They are: Total 
number of papers (P), years since first publication (PY), total number of citations (C), cites per paper 
(CPP) and the average number of citations per paper normalized for years since first publication (CPAY). 
Indicators often defined as indicators of “quality”: h-index (h), g-index (g), e-index (e) and age-weighted 
index? (AW). With this information the scholar can easily calculate the m-quotient (m) and the mg- 
quotient® (mg). These indicators do not adjust for the amount of authors-per-paper or add age-weighting 
parameters to each cited article. 


2 http://cybermetrics.wlv.ac.uk/survey-acumen.html 

3 http://www-harzing.com/pop.htm. Publish or Perish is a software program that retrieves and analyzes academic citations obtained 
from Google Scholar or Microsoft Academic Search. 

4 For more information about the indicators see: http://www.harzing.com/pophelp/metrics.htm 

5 AW index: AW is the square root of the number of citations to a given body of work divided by the total number of papers, it 
approximates the h-index if the average citation rate remains more or less constant over the years. 

ê Me-index: mg is a variation of the m-quotient. The m-quotient is h adjusted for the number of years since first publication; mg is 
the g-index adjusted for the number of years since first publication 
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3 Main results and discussion 


Women make up 22% of the overall sample reflecting the European ratio of men to women in science, 3:17. 
In the junior categories, PhD students, post docs and assistant professors, the ratio men to women is 2:1, 
while in the senior categories, associate professor and professor, the ratio is 4:1. This reflects the 2012 SHE 
figures of gender in research, confirming that our sample patterns the share of women employed in academia 
across Europe where gender imbalance increases with seniority* 

However, the size and content of the seniority categories were not homogenous. The spread of 
publication and citation data within categories and across fields was highly skewed and it was difficult to 
estimate effects of indicators and detect homogeneity, which is important if we wish to establish performance 
benchmarks. We used quartiles to illustrate the spread of the data and the median or second quartile as 
the best estimate of average performance within group. In all seniorities there were outliers that pulled the 
average performance up or down. Therefore the relative interquartile range (RIQR) was calculated. Even 
when outliers were removed, the variation in the number of publications a scholar produces, within each 
seniority, in Astronomy, Environmental Studies and Philosophy was still very large, but in Public Health 
there was less variation.To understand if we need to recommend gender specific indicators, we studied the 
career trajectory of scholars in our sample. Our hypothesis was a longer publication history in the junior 
seniorities could be an indirect indicator of possible female discrimination or other disruption in career 
promotion. PY was calculated and analyzed in panel box plots by gender and seniority to identify differences 
in length of publication history between male and female scientists. According to our data, advancement 
from PhD to associate professor for both genders was based on a 9 to 11 yearlong publication history. 
Professors had PY 3 to 6 years longer than associate professors in Astronomy and Public Health and 
additional 9 to 11 years in Philosophy and Environmental Studies. Women do not appear to need a higher 
number of publication years to advance. We compared the performance of female scholars to male scholars 
within seniority using the other indicators in this study. The performance of each indicator was highly 
individual and no gender-specific patterns were identified. 

We took Astronomy as a case study. Scholars were ranked per seniority in descending order for 
each indicator, P, PY, C, CPAY, h, g, e, AW, m, mg. Each ranking was copied to a table depicting the 
performance of all scholars, within seniority, across all indicators. The tables were divided into lower and 
upper quartiles. Each scholar’s placement in the rankings of each indicator was mapped manually and 
categorized as high (3rd quartile), middle (second quartile) or low (1st quartile). This resulted in the 
identification of two groups of indicators. The first group showed predictive relations: h, g, e, AW, m, mg 
where a high, middle or low score on one indicator predicted a high, middle or low score on another. The 
e, AW, m supplemented h while mg supplemented g. The top 25%, middle 50% or bottom 25% scholars 
remained the same but ranked in a different order. 

The second indicator group was “unpredictive” indicators: PY, P, C, CPP, CPAY. For example, 
a low P did not result in a high C - likewise a high PY did not predict a high P. The threshold where the 
ratio C to P resulted in a high CPP was highly individual. No individual or seniority patterns were found 
across this sub-group of indicators, and ranking resulted in different scholars appearing in the top, middle 
or bottom quartiles. No difference was observed between CPAY and m, resulting in redundant information. 

We suspected a ratio relationship between PY, P and C that controlled level of performance across 
ALL indicators. The ratio “years since first publication to amount of publications” was calculated for each 
scholar in Astronomy, then the ratio “years since first publication to total citations”. This is the math 


T Directorate-General for Research and Innovation, Unit B6 (2012) SHE Figures 2012: Gender in Research and Innovation. 
European Commission: Brussels. Retrieved from: http://ec.europa.eu/research/science-society /document_library/pdf_06/she- 
figures-2012_en.pdf 

8 SHE figures 2012. 
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behind the CPAY indicator, but the ratio is more informative than the single number CPA Y produces, eg. 
Scholar A averages 2 papers per year over his career and receives 28 citations per year=1 (year): 2(papers):28 
(citations) = 1:2:28 (CPAY=28). By comparing the scholar’s rank to their ratio we found the predictive 
indicators favour scholars with the ratio short “career:many papers:high citation count” over scholars with 
different “career:paper:citation” ratios. To investigate if it is the amount of citations per paper per year 
that dictate how useful the indicators will be to the scholar, we divided the amount of citations per year by 
the amount of publications per year for all the scholars identified in the top, middle and low quartile, eg. 
Scholar A ratio score 1:2:28, citation score per publication per year = 28/2=14. We compared this ratio 
score to their rank position and found the ratios within seniorities fit for the whole group, which in our 


dataset is a proxy for the disciplinary level, Table 2. 


PhD student Post Doc Assistant Prof Associate Prof Professor 
Ranked in top <18 citations <19 citations S27 citations S28 citations 


25% across all Not observed per publication per publication per publication per publication 


indicators per year per year per year per year 
>3 but <8 >7 but <18 >10 but <15 >15 but <27 
Ranked in citations per citations per citations per citations per 
i Not observed ma Bek i a aed ae 
middle 50% publication per publication per publication per publication per 
year year year year 


; <2 citations per <3 citations per <8 citations per <7 citations per <9 citations per 
Ranked in 


publication per publication per publication per publication per publication per 
bottom 25% 


year year year year year 


Table 2: Astronomy: Grouped ratios citations to papers to year 


4 Conclusion 

Publication and citation data is highly skewed, and using simple average based indicators, as an indicator 
of performance misrepresents the individual. The heterogeneity of the data makes comparisons to peers 
unwise and disciplinary benchmarks uninformative, however the low variance in the amount of publications 
between scholars in the same seniority in Public Health shows potential for the development of useful 
expected performance benchmarks. Gender specific indicators were not necessary in this study; we are aware 
of the many other variables in academic careers that can affect the career paths of female scholars. 
The h, g, e, AW, m or mg indices supplemented each other but exhibited a predictive relationship. There 
was information redundancy between the indicators CPA Y and m. 

The simple calculation of cites per publication per years-since-first-publication is more informative 
of a researcher’s publication activity and citation impact than the ready-to-use metrics. Further, it has the 
potential to be used to estimate if it is at all necessary for scholars to apply indicators to their CV. This is 
interesting for evaluation as instead of adding value to the information on CVs, unnecessary use of indicators 


can detract from the value of a researcher’s academic profile. 
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Abstract 

Manga is a Japanese style comic. Nowadays, production of manga in the digital environment is widely 
accepted since authoring/drawing tools on PC have become popular. However, the digitalization of manga 
production is only in the later stages of the whole production process. The goal of this research is to 
improve productivity of manga production using information technologies in the earlier stages of the 
process. A fundamental problem exists in information resource management in the production process, 
e.g., reuse of unused scenarios for a new content, revision of existing character image. In this paper we 
propose a manga production support tool which helps creators (re)use existing resource in the production 
process, e.g., annotations attached to design memos, communication history, and so on. The tool uses 
metadata for various resources used and created in the production process. This paper describes the 
background of the study and overviews the production support tool. 
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1 Introduction 


Manga is a comic conformed to a style developed in Japan and has been known as a representative of 
modern Japanese pop culture. Manga as a genre of publishing is characterized by the variety of its themes, 
genres, drawing styles and publishing styles. Regardless of the variety, commercial production process of 
manga consists of several steps before drawing pictures. A manga creator uses various information resources 
in the production process, e.g., books and dictionaries to collect facts used in a story, images of characters 
created by him/herself in a previous works or by other creators, and so on. The creator produces various 
intermediate products, which may not appear in the final product but useful in other production activities. 
Thus, he/she uses and produces many different resources in a manga production. The goal of this study is 
to build a tool to help creators find, use and organize those resources used in the manga production process 
in order to improve the productivity. 

In general, production process in the current commercial environment is not well recorded. It is still 
difficult to access the resources useful for manga production because those resources are mostly papers 
despite that the creators and their collaborators, e.g. editors and assistants, are working in a digital 
environment. In particular, semantic relations among the resources and intermediate products are not well 
documented, e.g. relationships among characters, revision history of character image design, and so on. 

Many digital manga are published as the collection of the image data, which means paper page is 
replaced by digital image. Every single page contains many components with which the creator has some 
related resources used during the production process, for example a document about a town in the real 
world used as a model in a story. Those resources are in various forms stored in various location — a 
document on the Web, a photograph in a PC, memorandum on a paper, and so forth. Therefore, 
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management of the resources associated with each page is crucial for improving the productivity because 
manga creators cannot find those resources easily in a traditional commercial production environment. 

The goal of this research is to solve this problem and help manga creators use the resources and 
increase the productivity. Our approach is to record the production process of manga and link those 
resources and the intermediate and final products by semantic relations. The production support tool 
proposed in this paper helps creating, editing and managing variety of resources including discussion memos, 
rough sketches, story plots, and so forth. These resources are maintained with their metadata expressed in 
a reusable and interoperable form. Interoperability is important not only for sharing metadata among 
different players involved in the production process but also for reusing the resources in another production 
process over time. In our tool we use Linked Data (or Linked Open Data) technologies because they are 
crucial for reusable and interoperable metadata (Berners-Lee 2006). The main functions of this tool are to 
visualize and trace the production process and the change of their contents. These functions enable a creator 
to link useful resources from/to his/her products. Those links will help production of manga in his/her 
current production process and/or that in the future with reasonably low costs. 


2 The Design Process of Manga Production and Its Materials 


The design process determines important steps in the manga production. First, a manga creator has to 
make up an overall and clear idea of his/her product. He/she has to sort many ideas and sort them into 
pictorial manga expressions. In some cases, manga is produced not by a single artist but by a group of 
people who have different specialties. In a commercial production of manga, an editor supervises the whole 
process collaborating with a creator(s) and help him/her brush-up the manga for publication in accordance 
with the commercial goal and restrictions. In a large production process, upper and lower streams of the 
process are carried out by a different group of people, e.g., creators and storywriters work in the upper 
stream and authoring tool operators work in the lower stream. Thus, it is crucial to utilize a tool to help 
the participants share information and resources used in the production process. 

In general, the design process of manga consists of three steps — narrative planning, scripting, and 
storyboarding. Narrative planning is to create the core features and entities included in the story, e.g., 
characters, places, and events. These elements are not only described in a text but also expressed as visual 
information, e.g. pictures and sketches. These elements may have close relationships each other but each of 
them is noted fragmentarily. 

Scripting is to construct the story of manga from those elements made in the narrative planning 
step. Text boxes aligned along with the story line is often made to overview the structure of story easily. 
This text-box scripting style is commonly used in Japan to write and brush-up a story not only for manga 
but also story-based works such as video programs. 

Storyboarding in manga production has the same role in the movie production but its format is 
quite different. A storyboard in manga production is a simple expression of the script of manga allocated 
page by page, indicating a layout of frames on each page, the compositions of picture on each frame and 
placement of word balloons. 

Thus, the ideas created in the narrative planning step are expressed as a series of scenes in the 
scripting step. Then, the scenes and their elements are expressed in a graphic form (or a form oriented to 
graphic representation) in the storyboarding step. In each of the three steps, several different resources and 
their metadata are created. In our study, we have designed a metadata model to describe the entities in 
these steps, i.e., elements of works, story structure, and graphic object as the resources about manga. We 
use Resource Description Framework (RDF) to encode the metadata and to link the resources. 
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3 Production Support Tool based on Metadata for Manga Production 


We have experimentally developed a manga production support tool, which helps editing intermediate 
products of manga as well as metadata about the resources used in the manga production. The metadata 
is stored in an RDF repository and searchable via SPARQL which is the query language for RDF databases. 
The production support tool consists of four tools to edit each type of intermediate products — setting lists, 
box scripts, annotations named plot-it, and storyboard. Figure 1 shows the entities of manga production 
with their relations described as metadata and the intermediate products include them (because of the 
limitation of the space, the detailed metadata schemas are not included in this paper). 
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Figure 1: The entities of manga production with their relations and the intermediate products include 
them 


Setting list is a list of notes about the ideas and settings like character, place and event of the story, all of 
which are created in the first step of the production process. Every note is a mixture of texts and images 
because of the nature of manga. URLs of the resources accessible on the Internet may be included in a note. 

In the second step where box scripts are created, a box-scripting tool which help create a table of 
scripts representing a series of scenes created using the elements included in the setting list. We can create 
a story for manga from the table of scripts. 

The last step is designing the graphic expression from the story, which is the most crucial process 
in the manga production. The production support tool helps this process by plot-it and storyboard editor. 
Plot-it helps to assign each scene to the area on the set of pages. Storyboard editor is for editing a storyboard 
of every page on a canvas. We use Scalable Vector Graphics (SVG) to implement these tools in order to 
use XML as a common platform for the implementation. Figure 2 shows a screen shot of the user interface 
of the combination of three tools, setting list, box script and plot-it when we design the story structure and 
graphic expression. 


961 


iConference 2014 Tetsuya Mihara et al. 


Opening type Object @Scene 

the method of preventing a 
[Daichi and Megumi come to the research field to 
prepare the practice tomorrow. 


Disinfection tanks are set 
by each rooms. 


fan accident They wash their boots 


when they get out the An ident (x) 
prevention area. (Megumi try to touch a cow innocently, but Minori 
stops her. 5 
introduction to — 
preventing epidemics a, WOGEEON 10 
troduction to preventing epidemi O preventing 


[Minori explains the prevention of epidemics to 


Daichi and Megumi. 
isit the cowshed 


Visit the cowshed o| 
Daichi and Megumi visit the cowshed in the 
bia method of research field. They have changed their clothes into 


[preventing epidemics 1 overalls. 


x 
he method of 
preventing 
5 x 
the method of preventing epidemics 
hi f 
EE Ta tant 
ti j 
Pone oam introduction of how to disinfect. Cows are kept oe 
separately by their age and health. | 


5 5 
the method of preventing epidemics 2 Q 

icomplain for prevention Understanding of [Drinking the milk 

[of epidemics the importance ffmaid in the 

Wicintina the mill mairi i Setting List Box Script Plot-it 


the method of 


Figure 2: User Interface of the combination of setting list, box script and plot-it 


4 Related Works 


The Movie Script Markup Language (MSML) (Rijsselbergen 2009) is a document specification for the 
structural representation of screenplay narratives for television and feature film drama production. MSML 
has models to describe scene, structure, timeline of screenplay and 3-D animation object. MSML is currently 
serialized into XML documents and is formally described by a complement of an XML Schema and ISO 
Schematron schema. 

Pellegrini shows how Linked Data contributes to existing value chains in the content industry by 
discussing a BBC (British Broadcasting Corporation) use case in the utilization of semantic metadata for 
the management of news content along the content value chain (Pellegrini 2012). It discusses the benefits 
of semantic metadata and how semantic metadata can be applied within the existing news production 
process, from contents acquisition to contents consumption by users. 


5 Discussion and Conclusion 


In this paper, we proposed a production support tool for manga to improve the productivity by linking 
resources and intermediate products to the final product and to the production record via metadata 
expressed in RDF. As the next step of our research, we will evaluate the usability of our tool and 
productivity improved by our tool in more practical production processes. As serial publication of a manga 
in a magazine is the most basic and popular publication process in commercial productions in Japan, we 
need to test our tool for the serial publication. In this case, we are expecting that our tool would perform 
well because the resources produced in previous publishing processes would be reused frequently in the 
iterative production cycle. 
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Abstract 

This poster will present preliminary results from a National Library of Medicine (NLM) funded 
afterschool program, HackHealth, that aims to increase tweens’ interest in health and health science, their 
health literacy, their health-related self-efficacy, and their awareness of the important connection between 
their everyday health behaviors and their ability to maintain their health and prevent disease. The 
program is centered in the school library and the school librarian serves as a partner in facilitating the 
literacy activities. In this poster, we will focus on the divergent experiences of two seemingly similar 
tween participants in our program, Danielle and Tamira, and how their differing experiences may have 
contributed to differences in outcomes, such as the degree of changes in their level of interest in health 
and the broader sciences (science, technology, engineering and mathematics (STEM) fields), in their 
health literacy levels, in their levels of health-related self-efficacy, and in their engagement in relevant 


health behaviors. 
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1 Introduction and Background 


In today’s media-rich world, youth are not merely consuming information as they are also creating 
information, sharing information, and making decisions based on the information that they receive. Youth 
make decisions regarding fashion, restaurants, movies, books, etc. — all from the information they receive 
from their circle of friends and family. With the prevalence of web access and mobile technologies, they can 
also now obtain advice from anyone in the world - an expert in the field or a stranger at a distinct location 
far away. A recent study (Meyers, Fisher, & Marcoux, 2009) found that the information behavior of tweens 
is quite complex and that tweens develop information literacy not only within the formal school 
environment, but also within the context of their informal relationships. Meyers et al. further found that 
when deciding which sources to consult, tweens attempt to gauge the potential benefits as well as the 
potential social costs of obtaining the information and they may sacrifice information quality in order to 
minimize potential social costs, such as embarrassment. 

One method to reduce the potential of incurring social costs in getting one’s needs for information 
met is to consult the Internet. A recent Pew Internet survey (Lenhart, Purcell, Smith, & Zickuhr, 2010) 
found that 31% of teens (ages 12 through 17) who go online use the Internet to get information about 
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health, dieting, or physical fitness. Furthermore, more than 1 in 6 (17%) teens reported using the Internet 
to obtain information about hard-to-discuss health-related topics, such as drug use, sexual health, and 
depression. This Pew Internet survey study also found that looking for health information online is far more 
prevalent among teens from low-income (annual household incomes less than $30,000) families. While 11% 
of teens from households earning more than $75,000 per year reported looking for health information online, 
this figure more than doubles (to 23%) among teens from low-income families. 

Despite the fact that these statistics indicate that a considerable proportion of youth use the 
Internet to look for health-related information, this does not confirm the-prevailing narrative that suggests 
all youth are digital natives — avid technology users who are intrinsically capable of using and synthesizing 
the information that they have found from digital sources (Jenkins, 2006; Palfrey & Gasser, 2008; Prensky, 
2001). In fact, we concur with the finding that the idea of youths as digital natives is problematic, especially 
with disadvantaged youth who tend to demonstrate diverse information and technology skills (Ahn, et al., 
2012; Foss, et al., 2012). With the current health trend that has gradually shifted from one that is primarily 
passive and state-based (i.e., one is either well or ill) to one that is more active and process-based (i.e., one 
is working toward preventing or managing disease), we believe understanding how young people look for, 
evaluate, and use health-related information to make decisions is vital in efforts to address the lack of health 
literacy among young people. We contend that disadvantaged youth do not inherently possess and enact 
health literacy; rather health literacy needs to be facilitated within the context of personal health and well- 
being. Working to improve the health literacy of disadvantaged youth is of great importance because low 
health literacy levels have been found to be correlated with low income levels (NCES, 2003) and to result 
in severe negative consequences for the economically disadvantaged individual, such as poorer health, 
decreased quality of life, and higher mortality rates (NN/LM, 2013). 


2 Purpose of the Study 


In this poster, we report preliminary results from our NLM-funded project in which we have developed and 
are co-implementing an after-school program (HackHealth — for more information, please see: 
http://hackhealth.umd.edu/) with school librarians in three middle schools in the mid-Atlantic in the 
United States. In this after-school program, we work with approximately 7 to 12 youth per school, 
encouraging them to identify a personally relevant health issue (to them and/or their families) they would 
like to focus on, and then assisting them in investigating health concerns that are personally meaningful to 
them (such as sports injuries or diabetes) throughout the program. We engage them in (a) conducting 
scientific inquiry into health maintenance and/or disease prevention and management using the Big6 
information problem-solving model (Eisenberg, 2008; Eisenberg & Berkowitz, 1990); (b) acting as health 
information intermediaries by sharing the information they learn with their family members; and (c) taking 
action based on what they learn through the program. Our overarching goals for the after-school program 
are to increase the interest of youth in the health sciences, their health information literacy, their health- 
related self-efficacy, and their understanding of the crucial link between their daily health-related behaviors 
and their ability to maintain their health and prevent disease. 


3 Research Design 


Participating schools are designated with Title 1 status in the United States, which means they have a 
high percentage of students who come from low-income families and high participation rates in the Free 
and Reduced Meals (FARMs) program. Using ethnographic methods, we collect data on the students’ 
various stages of information seeking using pre- and post-surveys, card-sorting exercises (St. Jean, 2012a, 
2012b), participant observation, search logs, interviews, and focus groups. We are focusing our analysis 
for this poster on two disadvantaged tweens, Danielle and Tamira, who share similar demographics with 
the rest of the participants. Despite their similar backgrounds (in terms of socio-economic status, 
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traditional literacy levels, and degree of family and social support), they showed divergent health-related 
information behaviors and levels of various aspects of health literacy (including information literacy, 
computer literacy, visual literacy, and numerical/computational literacy) while participating in our 
program. We highlight how they experienced the program, participated in the literacy-based activities 
that we implemented in the school library, and their health literacy development over the 8-week period. 
We will share the changes that occurred (or did not occur) in these two focal tweens’ levels of health 
literacy and degree of interest in the sciences and health by presenting salient interactions and events that 
emerged from our data analysis. We will further explore how their experiences in the program may have 
helped to contribute to their increasing (or decreasing or unchanging) interest in a future career in health 
sciences or in science, technology, engineering, or mathematics (STEM) more generally. 


4 Implications and Conclusion 


Health literacy helps youth to understand how their choices affect their health, and encourages them to 
take responsibility for their own health behaviors. This research informs the importance of literacy 
development in increasing health-related self-efficacy among youth, and improving their understanding of 
the crucial link between their daily health-related behaviors and their ability to maintain their health and 
prevent disease. Additionally, this research also investigates the possible relationship between literacy 
development and identity development. We examine how advancement of health trajectories (through 
learning of health literacy) could lead to the development of STEM identity trajectories among youth, 
specifically whether health literacy and understanding personal health contribute to disadvantaged youth 
considering a career in health sciences and whether there is a correlation between health literacy and 


pursuance of STEM courses and careers. 
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Abstract 

This poster introduces lesbian, gay, bisexual and transgender (LGBT) grassroots information 
organizations and outlines a plan for an ethnographic research project to study these organizations’ 
current contexts. Drawing upon queer theory, information grounds, library as place and the new 
librarianship, this project investigates the following questions: Why and how do members utilize and 
develop LGBT grassroots information organizations in this era of ever- proliferating LGBT information? 
What are the challenges and opportunities for LGBT grassroots information organizations as they 
navigate different organizational models such as autonomy and institutional partnership? Do, and if so, 
how do queer information activities continue to take place within LGBT grassroots information 
organizations? 
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1 Introduction 


Within lesbian, gay, bisexual and transgender (LGBT) communities in Canada and the United States there 
is a legacy towards collecting and disseminating information otherwise ignored or destroyed by society-at- 
large. These LGBT grassroots information organizations operate outside of the conventional professionally 
dominated library and archival realm, which creates “queer” informational organizational contexts. 
Examples of LGBT grassroots information organizations’? queer strategies include: locating their 
organizations in private homes and home-like environments, collecting ephemeral materials, and an interest 
in “everyday” people and their experiences as opposed to famous figures (Cvetkovich, 2003; Halberstam, 
2005; Cooper, Forthcoming; Cooper and Przybylo, Forthcoming). In contrast to the earlier era of LGBT 
invisibility and information scarcity that LGBT information organizations emerged from, LGBT 
information, social and networking opportunities are now seemingly plentiful and easily available online. 
The popularity of LGBT and queer studies within the academy and growing visibility of LGBT issues more 
broadly, furthermore, has led to new partnerships between LGBT grassroots information organizations and 
other public and private institutions such as academic libraries and archives, banks and corporations and 
the emergence of LGBT information organizations in exclusively academic contexts and increased LGBT 
programming and resources within public libraries. 

Studies on the information-seeking habits of LGBT-identified people demonstrate that the LGBT 
community has information needs that are not met by mainstream libraries and archives and that the 
LGBT community continues to feel distrust and discomfort within mainstream library and archival spaces 
(Creelman & Harris, 1989; Whitt, 1993; Joyce and Shrader, 1997; Taylor, 2000; Hamer, 2003; Rothbauer, 
2004a; 2004b; Rothbauer, 2007). While professionally oriented library literature addresses LGBT issues in 
terms of programming and resource recommendations for LGBT identified patrons in public and academic 
library settings (Martin & Murdock, 2007; Mehra & Braquet, 2007; Greenblatt, 2010), the unique contexts 
for serving LGBT information needs within grassroots organizational settings need to be explored in further 
depth. A rigorous, ethnographically informed qualitative investigation of LGBT grassroots information 
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organizations is needed to explore their evolving contexts and the motivating factors behind their continued 


creation and use. 


2 LGBT Grassroots Information Organizations’ Evolving Contexts 


LGBT grassroots information organizations in Canada and the United States first emerged in conjunction 
with the lesbian and gay movement post World War Two. For example, The ONE National Gay and 
Lesbian Archives (ONE) in Los Angeles, California began as the private collection of activist Jim Kepner, 
who was highly involved in homophile organizing through Los Angeles’ Mattachine Society, and helped 
found ONE Magazine and the ONE Educational Institute in the early 1950s. Similarly, the Canadian 
Lesbian Gay Archives (CLGA) in Toronto, Canada was founded in 1973 under the auspices The Body 
Politic, a gay liberation newspaper. The Lesbian Herstory Archives in Brooklyn, New York is also 
noteworthy for being formed through lesbian feminist organizing and continues to operate under a lesbian 
feminist mandate. 

Reflecting trends in LGBT activism and increasing LGBT visibility generally, LGBT information 
organizations now exist in a variety of private and public configurations and are not necessarily connected 
to activist activity. Pre-existing, previously autonomous LGBT information organizations are increasingly 
partnering with other institutions: ONE is part of the University of Southern California Libraries and the 
June Mazer Lesbian Archives, also in Los Angeles, has a project to make materials more publically accessible 
with the University of California Los Angeles library and the Centre for the Study of Women. LGBT 
information organizations are also now being created in academic contexts, such as the Transgender 
Archives at the University of Victoria in British Columbia. Other LGBT archives such as the CLGA, 
however, remain entirely autonomous and new autonomous LGBT grassroots information organizations, 
such as Out on the Shelf in Guelph, Ontario also continue to be formed. 


3 Investigating LGBT Grassroots Information Organizations Ethnographically 


Drawing upon queer theory, information grounds, library as place and the new librarianship, this project 
investigates the following questions: Why and how do LGBT community members utilize and continue to 
develop LGBT grassroots information organizations in this era of ever-proliferating LGBT information? 
What are the challenges and opportunities for LGBT grassroots information organizations as they navigate 
different organizational models such as autonomy and institutional partnership? Do, and if so, how do queer 
information activities continue to take place within LGBT grassroots information organizations? 

In order to develop a rich descriptive account of the unique activities within particular LGBT 
grassroots information organizations while also developing an analysis that cuts across organizations, 
research for this project will be conducted at two LGBT grassroots information organizations utilizing a 
comparative ethnographic approach. There is a growing movement toward conducting ethnographic 
research within information organizational settings (Foster & Gibbons, 2007; Flinn, Stevens & Shepherd, 
2009; Asher, Duke & Green, 2010; Khoo, Rozaklis). 


4 Abbreviated Literature Review 


Queer theory is a multi-disciplinary critical framework that examines social and cultural activities through 
an outsider or “queer” perspective. As Turner (2004), explains, the term “queer” does not relate to a specific 
identity category, but rather, the failure of fitting into an established set of societal expectations. While 
queer theory research highlights the unique activities that take place in LGBT grassroots information 
organizations (Cvetkovich, 2003; Halberstam, 2005), these works fail to take information-based activities 
explicitly into account. This project, in contrast, aims to place queer theory in dialogue with information 
grounds, library as place and the new librarianship. 
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Information grounds captures the informal, socially driven information-based activities take place in 
LGBT grassroots information organizations in addition to more formal information activities. Fischer, 
Landry & Naumer (2006) identify a number of factors that contribute to information ground creation 
including: conviviality, creature comforts, and location. These factors may help explain why LGBT 
grassroots information organizations continue to popular in an era where LGBT information is widely 
available online and in public library settings. 

Library as place as per Buschman and Leckie (2007) allows us to view libraries as “physical entities where 
a complex mix of activities, processes and actions occur on a daily basis.” (p. 3). Focusing on the “library 
as place,” therefore, invites reflection on the location-specific, physical and material qualities to experiences 
within LGBT grassroots information organizational environments. 

The new librarianship as per R. David Lankes (2011) advocates for a framework that transitions away 
from collections-driven approaches to community-driven approaches including new partnerships between 
academic and public libraries, community members and community organizations. LGBT grassroots 
information organizations provide a case for exploring the potentialities of new librarianship because they 
are community-driven information organizations and because they are increasingly collaborating with public 
and academic libraries and archives. 


5 Conclusion 


This poster will introduce LGBT grassroots information and provide: 1) an overview of the study design 2) 
highlights from the study’s literature review and 3) updates on progress of the study. This study will be an 
important step towards understanding the evolving context of LGBT grassroots information organizations 
as a facet of LGBT information activity more broadly. As a result, the study places LGBT and queer studies 
in dialogue with research pertaining to community-based information activities including information 
grounds, library as place and the new librarianship. 
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Abstract 

Digital curation is a rapidly evolving field, driven by a range of factors including the explosion of data 
currently being created, continuous change in information technology and emerging patterns in 
information use. As a result there is a pressing need for information professionals in the work force to 
remain abreast of developments in the field. Prior research conducted as part of the DigCurV project 
resulted in a curriculum framework, intended to guide the development of vocational training curricula, 
and identified a set of 12 characteristics that summarize the strengths of existing training programmes 
while reflecting the on-going training needs of working professionals. This poster bridges the gap between 
these tools by illustrates 8 ways in which the framework can be to achieve these 12 characteristics. 
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1 Introduction 


Digital curation is a rapidly evolving field, driven by a range of factors including the explosion of data 
currently being created, continuous change in information technology and emerging patterns in information 
use. This situation puts heavy demands on curators and other information professionals responsible for 
maintaining digital objects. In this environment, working professionals need on-going development of their 
skills in order to stay current with technological advances and have the ability to address new challenges. 
A parallel challenge exists for educators seeking to construct viable and effective vocational training 
programmes that address the needs of experienced information professionals. This poster will illustrate how 
the curriculum framework developed by the DigCurV project can be used to facilitate these objectives. 

By virtual of its target audience, vocational training in curation needs to be seen as distinct from 
either formal full-time graduate-level professional education or the research data management training given 
to researchers in other fields. Unlike graduate students or researchers, who are often new to the material, 
information professionals bring an existing knowledge base and a well-developed skill set to the training 
situation. These professionals require actionable knowledge and skills that can be immediately applied to 


their particular situation. 


2  DigCurV Curriculum Framework 


To facilitate effective development of curricula for vocational training, the DigCurV project created a 
curriculum framework the core of which is a series of three lenses or views that encapsulate the skills, 
knowledge, competencies and personal attributes required for success in a concise visualization (Molloy et 
al., 2013). Each of these lenses reflects the unique perspective associated with three major roles in curation 
activities: practitioner, manager, and executive. Their content is divided into categories and expressed as 
descriptors based the Vitae Researcher Development Framework and the information literacy taxonomy 
developed by the Research Information Network (RIN) (Molloy et al., 2013). 
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3 Strengths of Existing Training Programmes 


In addition to the curriculum framework, the DigCurV project also resulted in a report on the broader 
context of existing vocational training initiatives. This report was based on a survey of 11 currently active 
or recently completed efforts to create, structure or develop curricula for vocational training. In doing so, 
the report identified 12 characteristics that summarize the strengths of existing training programmes and 
reflect the on-going training needs of working professionals (Moles & Ross, 2013). Like the descriptors in 
the framework lenses, these 12 characteristics are intended to be aspirational and to guide educators towards 
success. Currently, no programme contains all 12 characteristics. They reflect diverse strengths aggregated 
for use as an objective for development. The 12 characteristics as presented in DigCurV report D6.2 are 
below: 


e Sustainability; the field is in a constant state of development, training will need to be a continuous 
process if professionals are to remain conversant with the latest advances. 

e Consistent Incremental Evolution; programmes must provide a stream of new knowledge as it 
emerges as well as instruction in the accepted body of general or foundational knowledge. 

e Systematic; a structured approach to training is necessary to ensure the all relevant topics are 
included, content is appropriately targeted and redundancy is kept to a minimum. A major step 
would result from defining a canon of preservation and curation knowledge that professionals require 
and keeping that canon under review. 

e Tailored; curricula must fit the needs of the professional community, match the professional roles of 
participants, and be complimentary to their daily activities. 

e Based on expert consensus; curricula should be distinguishable from open research questions in 
order to prevent vocational training becoming little more than a weather vane to academic debates. 

e Operational; the orientation of the course content should be towards practical results in real world 
scenarios. The material presented should be readily applicable in curation workflows. 

e Certification; training programmes should be embedded in a certification structure to provide 
evidence that professionals have and are maintaining the relevant skill set. Means should be in place 
for the maintenance of the certified status through continued training. 

e Portable; while the training should be tailored to specific jobs, the skills, knowledge and 
competencies learned should be applicable beyond the particular instance of employment. 

e Leverage existing knowledge; the participants in vocational training are assumed to be highly 
educated information professionals who approach programmes with a well-developed skill set 
relevant to the curricula. These skills should be harnessed to maximize the effectiveness of the 
training. 

e Incorporate participant/trainee feedback; a mechanism should be in place to systematically gather 
and evaluate feedback from the audience at every stage. This can be used to evaluate effectiveness 
and inform later iterations of the curriculum. 

e Address issues of all relevant digital object forms; formats or file types that can reasonably be 
expected to exist within the purview of a repository cannot be ignored by training curricula. 

e Utilize appropriate dissemination methods; Vocational training has a much wider range of potential 
delivery methods than other forms of education. The full spectrum of these methods should be 
explored in order to provide the audience with learning opportunities that match their needs. 


These 12 characteristics describe programmes rather than the curricula to which the DigCurV framework 
is predominantly directed. However, programmes and curricula are interrelated and inseparable. In order 
for programmes to achieve these 12 characteristics they need to overcome obstacles that can be 
conceptualized as tension amongst the range of knowledge, skills and competencies needed my their 
audience. This is played out in a training environment through the needs of audience members for content 
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that is general and conceptual, as well as specific and technical. Examples from recent literature include 
the need to balance technical skills with soft skills (de Smaele, Verbakel, Potters, & Noordegraaf, 2013) and 
organizational aspects with domain-specific aspects (Kim, Addom, & Stanton, 2011). Educators developing 
training initiatives have also acknowledged more immediate challenges that are symptomatic of this 
situation, such as defining the boundaries of the content to the covered, the rapid growth of literature in 
the field, and the lack of widely accepted best practices (Botticelli, Fulton, Pearce-Moses, Szuter, & Watters, 
2011). The success of vocational training relies on educators navigating these competing needs in the design 
and delivery of their programmes. 


4 Connecting Training Programmes with the DigCurV Curriculum Framework 


The functionality of the DigCurV curriculum framework can be extended to act as a tool for balancing 
these competing needs. By concisely defining audiences based on their operational roles, the framework 
allows educators to position their programmes within this space at a point most beneficial to their audience. 
The descriptors within the lenses can then serve as a focal point around which a balance can be achieved. 
The framework does this in 8 ways that address these 12 characteristics either directly or indirectly, while 
bridging this gap between constructing tailored curricula and structuring effective programmes. These 8 
operations are listed below: 


e Guide the selection of content for inclusion in curricula 

e Structure elements of the curricula by linking components through a common perspective and 
demonstrating relationships between roles, actions and knowledge 

e Provides a knowledge base against which individuals can be tested for certification 

e Determine appropriate pedagogical tools and techniques 

e Isolate specific audiences for training and tailoring curricula 

e Serve as a bridge that links activities, roles and functions with higher-level models for curation such 
as OAIS and the DCC Lifecycle Model 

e Act as a conceptual meeting point for educators and practitioners in the field 

e Function as a means to balance and combine the binaries around which vocational training is often 
built 


Combined at the point of curriculum and programme development, these 8 uses will allow educators to 
construct programmes that harness the strengths of existing training initiatives and better address 
vocational training challenges. The success of digital curation as both a field of study and an area of practice 
relies on adequately skilled information professionals in the workplace and a symbiotic relationship between 
these two activities. Use of the DigCurV curriculum framework in the capacity described here is a decided 
step in this direction. 
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Abstract 

This action-based National Science Foundation-funded research project is prompted by the growth of 
broadband in industries in rural communities in which over 40% of households lack broadband capability. 
Through the multi-method pursuit of five rural community-focused need and barrier-focused research 
questions, the researchers are investigating the role of Career and Technical Education (CTE) pathways 
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1 Introduction 


The goals of this four-year project are to strengthen the employee pool of IT/broadband staffing (including 
general IT, broadband and network technicians) and to improve educational support related to broadband, 
telecommunications, and networks for future and current IT employees in rural Northwest Florida and to 
understand how to transfer this competency to other similar rural markets. 


2 Background 


Low broadband adoption rates in rural communities can be attributed in part to decreased availability of 
broadband service, expense of computers and Internet service, and a perceived lack of need for a household 
connection [1]. Yet, if rural communities are going to capitalize on the benefits that broadband can bring 
for economic development, they will need more employees with advanced, diverse technology skills. 

According to the U.S. Bureau of Labor Statistics, the computer systems design and related services 
industry will be in the top five growth industries for 2008-2018, with the strongest growth occurring in 
network systems and data communications analysis [2]. As government services, health care, business and 
commerce, and social networks are incorporated into this technological advance through the use of 
broadband connectivity, rural communities that are not prepared to exploit broadband will be left without 
technical support for a range of services that further economic development. There is a significant workforce 
need for information technology (IT) workers with skills and knowledge of broadband technology to support 
the needs of rural employers and industries, and in turn economic development. This effort mirrors the 
metaphorical theme of ‘breaking down walls’ through an understanding of rural cultures that may not 
embrace or appreciate the benefits of broadband connectivity. 
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Previous needs assessments conducted in rural Northwest Florida indicate that broadband is a key 
factor in many rural community economic development efforts. The more successful efforts were often the 
result of IT/broadband technicians who possess a complement of knowledge and abilities that include 
advanced technical skills as well as management, supervisory, and critical thinking skills. 

The project’s research will focus on the educational and career pathways of IT technicians who 
support broadband deployment in rural communities in Northwest Florida. The project will identify the 
workplace roles and economic development importance of broadband technicians; the education needed to 
be successful in these roles; the processes to sustain partnerships between educational and industry 
stakeholders; and the rural economic development can be promoted by individuals with these key skills. 
This study will contribute to the ongoing development of iSchool literature that seeks to understand the 
use of information technology by those responsible for its facilitation and by those for whom the technology 
is newly acquired. 


3 Research Questions 


RQ 1. How do the IT/broadband skills graduates gain through two-year community college programs 
compare to the needs expressed by employers in rural/metropolitan areas? 

RQ 2. How do the IT/broadband skill graduates gain through two-year community college compare to 
the skill sets new professionals identify they need after they are hired as IT employees in 
rural/metropolitan areas? 

RQ 3. What, if any, gaps exist between the skill rural/metropolitan employers report their 
IT/broadband employees need and the skill sets new professionals report they need to be successful 
as IT/broadband employees? 

RQ 4. What, if any, differences are there between the skills needed for IT/broadband employees in rural 
and metropolitan areas? 

RQ 5. How can two-year community college IT/broadband program curricula are modified to best meet 
the specific needs of employers and IT/broadband employees in rural/metropolitan areas? 


4 Project Deliverables 


This study design will develop several project deliverables for the Northwest Florida region including a 
descriptive typology of the career pathways for IT/broadband; a typology of rural IT/Bb technician roles; 
identification of gaps between rural employer needs and current two year curricula; and, revised curricula 
that address the gaps of current technician education in rural communities. A key element of the project 
is to enhance the network between Tallahassee Community College (TCC and Chipola College (Chipola) 
and their stakeholder rural employers. A crucial element of this research will be the matching of 
community college curriculum with employer needs in an attempt to promote economic development in 
rural Northwest Florida. 


5 Methodology 


Overview of Methods: This research will include a multi-method approach, including qualitative and 
quantitative methods and a secondary data analysis of existing data. 


1. Literature Review of two-year educational pathways and an environmental scan of all schools 
involved to establish known pathways and describe emerging pathways for diverse student bodies. 
will be examined and defined and included in the study. The environmental scan for each school 
will include a list of all class, curricula, faculty and employers involved and a review of the major 
impacts on the school over the preceding five years. 
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2. Content Analysis of Chipola, and TCC IT Course Syllabi and Learning Outcomes: This method 
will be used to develop lists of skills gained by students who successfully complete IT coursework. 
Syllabi of all undergraduate IT/broadband courses at Chipola, and TCC will be collected and 
analyzed through an iterative process using open and axial coding. This analysis will be completed 
twice (at the beginning and toward the end of the project), with the first version informing the 
development of other instruments, and the second version allowing the research team to use updated 
information to inform curriculum suggestions. 

3. Content Analysis of IT Job Ads and Job Postings in Northwest Florida: A purposive sample of job 
ads and postings gathered through all available sources such as contacts at Chipola, and TCC, 
newspapers, and online job postings will be used to develop a list of job requirements for 
IT/broadband positions in the area. The sample will be stratified by rural/metropolitan 
classifications. This analysis will be completed twice (years one & three), with the first version 
informing the development of other instruments. 

4. Semi-Structured Interviews with IT/Broadband Educators: Semi-Structured Interviews with 
IT/broadband educators at Chipola, and TCC: A purposive sample of 16-20 educators responsible 
for curriculum development and classroom delivery will be interviewed, with proportional 
representation. These interviews will be used to explore factors in curriculum development and 
delivery and to understand the relationship between faculty and industry stakeholders. 

5. Semi-Structured Interviews with IT/Broadband Hiring Managers in Northwest Florida: A purposive 
sample of 16-20 Northwest Florida employers. An interview schedule to measure employer needs 
can be identified from the literature review, senior personnel input and the project objectives. In 
addition, a skills card sort can be employed with skills sets from the course syllabi and job 
ads/posting analyses informing this part of the instrument. 

6. Focus Groups with New Professional IT/Broadband Employees in Northwest Florida: Four focus 
groups will be conducted (two of employees of rural organizations and two of employees of 
metropolitan organizations). Two purposive samples of IT employees (one for each focus group) will 
be developed by asking IT/broadband hiring managers to identify potential participants from their 


organizations. 


On-site Classroom Observations and Secondary Data Analyses of previous needs assessments will also be 
conducted to ensure face validity of findings. 


6 External Evaluation and Assessment Plan 


Project evaluation will focus on formative and summative measures. Formative aspects of the evaluation 
will center on the conduct of the research, the quality and trustworthiness of the research methods 
undertaken, the quality of the collaboration among TCC and Chipola, dissemination efforts and the 
sustainability of the research and its application. Summative evaluation will focus on the extent to which 
the researchers met their goals and the impact the research products, outputs had and potentially will have 
on IT/broadband education in two-year colleges. 


7 ~ Conclusion 


In this project, the researchers will investigate how to break down the walls to broadband-enabled economic 
development by identifying effective and sustainable preparation for IT/broadband technicians in Northwest 
Florida. This project supports the NSF Merit Review criteria through local, action research that will topple 
walls of resistance to technology transfer with sensitive approaches to culture and context using new 
broadband technology that supports these rural communities, and will be key, innovative addition to iSchool 
research and literature. 
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Abstract 

This poster outlines the preliminarily findings of the EDITS study: an inquiry into the digital information 
habits of senior citizens. The research presented here will focus on the adoption of ereading technology 
by seniors in order to determine the habits and attitudes, motivations, and barriers experienced by this 
demographic. Employing semi-structured interviews and the Unified Theory of Acceptance and Use of 
Technology (UTAUT), this study aims to investigate one element of the digital divide that sometimes 
goes unnoticed: age. Despite ingrained habits based on print, findings show motivations, such as 
convenience, contribute to the adoption of ereading by seniors. 
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1 Introduction 


Studies of the digital divide in Canada show that seniors are lagging behind in access to the Internet, skill 
level, and use of social media (Quan-Haase, Haight, & Corbett, 2013). As individuals who did not grow up 
with computers or other digital tools, this population finds it harder to fully emerge themselves into the 
digital realm. While children born today may be exposed to such technology from birth, seniors have had 
to adapt to many changes throughout their lives and make choices about the adoption of various digital 
technologies. Studies show that many seniors embrace these changes and are eager to learn (Nasmith & 
Parkinson, 2008), however, some are naturally more resistant. 

Reading is a unique practice that is culturally and socially ingrained in a person’s everyday life. 
Seniors are a large demographic of readers: survey research shows that seniors identify reading as both 
“enjoyable” (Ngandu & O'Rourke, 1979) and “very important to them” (Dettlaff-Lubiejewska, 2008). 
Reading preferences and practices developed over a lifespan can be difficult habits to break. To examine 
these habits, we ask the following questions: How do seniors approach ereading? Are they hesitant to change 
their reading habits or are they willing to try digital tools and read etexts? Investigating these research 
questions is important because texts are increasingly available in digital formats on computers, smartphones, 
tablets, and dedicated ereaders (Burritt, 2010). 

The following research questions will be addressed: 


1) How have ebooks changed the reading habits of seniors? 
2) What motivates seniors to use ebooks in their reading for pleasure? 


2 Literature Review 


Previous research has examined the attitude of readers towards ebooks (Burritt, 2010; Chou et al, 2010), 
ebooks in historians’ scholarship (Martin & Quan-Haase, 2013), changes in ereading behavior (Liu, 2005, 
Gardiner & Musto, 2002), the paper-digital divide (Luff et al, 2004), the design of ebooks (Luff et al, 2004), 
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and ebooks and publishing (Tian & Martin, 2010). While little work has been done specifically on seniors 
and digital reading, there has been research conducted on their Internet use (Quan-Haase, et al., 2013). 
Peacock and Kunemund (2007) note that “lack of a device, motivational indifference and deficient 
knowledge are the main reasons behind non-usage of the Internet by seniors” (p. 198). Similarly, Morris, 
Goodman and Brading (2007) found that seniors were unaware of the Internet, were afraid to use it, or felt 
that they were getting too old to be bothered with the technology. Regardless, research has been done that 
looks at how to meet seniors information needs in the changing digital environment (Godfrey & Johnson, 
2009; Campbel, 2008). 

As readers, senior citizens have been the focus of studies about reading habits, attitudes, and 
motivation (Ngandu & O'Rourke, 1979; Robinson, 1980; Scales & Biggs, 1983; Schutte & Malouff, 2007). 
These studies reveal that factors, such as education, income, socioeconomic status, ethnicity, and sex, 
contribute to reading patterns and motivation (Grubb, 1982; Kling, 1982). Extensive studies have also been 
conducted on the specific problems faced by senior readers, their reading needs, and how the library can 
facilitate solutions and programs that address the issues and requirements of seniors (Carsello & Creaser, 
1981; Romani, 1983; Herr & Bridgland, 1989; Anderson et al., 1992; Dettlaff-Lubiejewska, 2008). While 
many of these studies identify issues such as impaired vision, reduced hearing, memory loss, diminished 
mental acuity, and limited mobility, only a few address the affects of seniors in a digital reading environment 
and the unique challenges and possibilities of digital technology in overcoming these issues. Ordonez et al. 
(2011) investigate the effects of a digital inclusion program on the cognitive performance of elderly adults 
such as orientation and attention, memory, verbal fluency, language, and visio-spatial skills. Their findings 
suggest that the acquisition of knowledge and the use of digital tools may bring cognitive gains. Those 
studies that address the shift towards digital reading do not look broadly at adoption but, rather, focus on 
the development of specific programs and services that address barriers of elderly reading. Solutions to these 
barriers are examined in the development of educational programs (Irizarry et al., 1997; Bean, 2003) or the 
adaptive design of digital products such as video games and websites (Pernice & Nielsen, 2001; Ijsselsteijn 
et al., 2007). 


3 Theoretical Framework 


The Unified Theory of Acceptance and Use of Technology (UTAUT) was used as a framework for analysis. 
This theory presents four constructs (performance expectancy, effort expectancy, social influence, and 
facilitating conditions) (Venkatesh et al., 2003) and identifies gender, age, experience, and voluntariness as 
moderators of the four constructs. Age is the only moderator to affect all four constructs. As age is a defining 
parameter of the senior demographic this theory works well to establish how age affects the acceptance of 


ereading. 
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Figure 1: UTAUT Model (Source: Venkatesh et al., 2003) 


4 Methods 


Semi-structured interviews were conducted with 15 seniors (ages 60+) between September 2012 and June 
2013 in London, Ontario, Canada. Participants were recruited through posters and information sessions at 
local events for seniors. Analysis employed grounded theory modeled after the approach outlined by 
Charmaz and Belgrave (2003) and Charmaz (2006). 
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5 Findings 


5.1 Habits 


Trust of Internet and 
electronic purchasing 
Digital reference materials Enew sletters, or library bulletins that 


Definition 


Physicality of print books 


Nostalgia and reluctance to depart 
from established reading habits. 


Ebooks are for young people Balief that they are too olé to enjoy 


ereading, or to learn the technology 
required for reading digitally. 

The portability of ebooks and the 
ability to carry multiple ebooks on one 
device. 


The ability to take many ebooks when 
traveling as opposed to one print book. 


Trust plays a role in whether they 
purchase online or not. 


affect reading matenal choice. 


The internet is used to check the 
availability of reading matenals at 
bookstores and/or libraries. 


Expressed an interest in trying an 
ereader. 


Table 1: Preliminary coding chart 


Example 
$11: “Tlike the tactile expenence of holding a 
book” 


S 15: “Well, Isupposeit’s habit. It’s zone ona 
long time, you know . Um, I’m not—I find it 
more difficult to rad stuff on screen—you 
know, on the computer screen. Um, and maybe 
it’s because I haven't done enough of it. I tend 
to—it takes the joy out of reading.” 


S 14: “... the younger generation is probably 
more comfortable doing everything with the 
computer, whereas I’m not.” 

S$ 12: “Now, Ken Follet’s books are big and 
heavy. I think an ereader would probably be a 
lot easier in that respect.” 


S 12: “I don’t have an ereader. Um, I would like 
to get one. And I think it’s convenient to take a 
ereader with a mmber of books onit travelling, 
because I do like to travel. Rather than carry the 
bullaness of a book” 

$11: “They’re convenient when you're 
travelling, for example.” 

S$ 10: “Yeah, so that is great for travel. Icould 
say I prefer it for travel.” 


$11: “Tcan’t tell you why. Ijustdon’t feel 
comfortable. so I dan’t do it.” 

S 14: “Because I subscribe to a, um [pause] a 
newsletter. An American newsletter, Book 
Browse, on new book: and book reviews, so 
probably, um, more aware of more American 
publications that I might put on my to-read list 
than I would otherwise, having just [pause] read 
reviews in McLeansor The Globe or something 
like that, which w ould be more Canadian- 
oriented. $ oI suppose it’s changed maybe 
that... little bit.” 


$11: “Um, no, Pil look up online to see the 
availability...” 

S9: “I have also gone online at home so that I 
can reserve books.” 

2: “When you go on the main website to, um 
check the catalogue, or to put a book on hold, 
yimow there’s always a little sidebar thing 
telling you about what's going on, so I've seenit 
there as well.” 


$11: “I would be open to trying one. Just to 
see—just to experience it.” 
§ 12: “Idon’t have an ereader. Um, Iw ovlá like 


to get one.” 


In general, seniors report that ebooks have not changed their reading habits, as they continue to rely on 


print books, newspapers, magazines, etc. Nonetheless, they also report that they rely on a wide range of 


digital materials. Although a number of the seniors do not read ebooks, they do obtain information about 


print books through digital means. For instance, they read summaries and recommendations on the Internet 


or enewsletters. They also acquire logistic information about libraries and stores from electronic sources. 


These additional digital resources help seniors make their reading selections. Further, even those who read 


exclusively in print obtain books through online purchasing or reserve library books online because that is 
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often most convenient for them. Hence, the digital is playing a central role in supplementing information 
and print reading habits. 


5.2 Attitudes 


Seniors are still reluctant about digital reading and in particular reading entire books in digital formats. 
However, some seniors expressed interest in obtaining an ereader and starting to explore the experience and 
features of digital reading. They were open to new forms of reading and thought it could provide them with 
alternative means of accessing information and reading books. Yet, a majority of seniors were reluctant to 
adopt ereaders and preferred to leave digital reading to younger generations claiming that it is a good thing 
for young people but not for them. Surprisingly though they were comfortable using email, Google, and 
other sites, but felt that reading at length was something they would rather do in print. 


5.3. Motivations 


Convenience is cited as a large motivator for seniors to use ebooks. Those that do read ebooks enjoy both 
the portability of ebooks as well as the easy customization of font size and backlighting which overcomes 
issues, such as impaired vision. Further, many of the seniors who do not read ebooks assert that it seems 
very convenient especially for travel and some indicate a willingness to try ebooks in order to gain that and 
other practical affordances. However, despite some motivation and willingness there is still a dedication to 
the familiarity of print books. Even those who use tablets and ereaders sometimes prefer the physicality — 
look and feel — of print. Hence, digital reading is in most cases done in addition to print reading. 


Convenience 
of ereading 
9 
— 
1 
Comfort with Intention to Seni 
Technology adopt Tose l 
: ereading | 
ereading 
- L 


Friends and 
Family 
ereading 


Established 
Reading 
Habits 


Gender Age Experience Voluntariness 


ofuse 


Figure 2: Tentative UTAUT for seniors ereading model 


5.4 Barriers 


Familiarity with technology, such as the regular use of computers, contributes to seniors’ decision to read 
digitally. Hence, the low digital literacy levels of seniors can be a barrier to the adoption of ereaders. 
Findings show that lack of trust in digital technologies such as the Internet, especially in regards to online 
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purchasing, as well as limited knowledge about and experience with devices such as tablets and ereaders 
contribute to seniors’ adoption of digital reading. 


6 Conclusion 


As digital reading evolves and increases in popularity, it will continue to affect each demographic of readers 
in different ways. Scholarship detailing the use of technology by seniors shows that while they may be 
trends towards acceptance, many members of this population do not feel comfortable or knowledgeable 
about technology. As is evident in the existing literature identifying the information needs of seniors and 
positing solutions to perceived problems, the knowledge gained in this project will aid in library collection 
and program development, industry practices, and technology development for this growing population. 
Considering the number of senior citizens in the current population and their affinity with reading, ebook 
designers and ereader marketers could benefit from the following findings: 


1. Seniors desire clear features for sharing ebooks with friends (potentially via email, as this is an area 
of the Internet where they are comfortable). 

2. Seniors should be made aware of the ereaders capability for keeping track of their reading habits. 
Many seniors keep a log of what they have read and their thoughts on the book, author, period, 
etc., and are not aware of these features of ereaders. This could also be an added form of social 
media that readers could share with friends, or members of bookclubs, etc. 

3. Features that aid visual impairment (such as text enlargement and font selection) and night reading 
(reading in bed) should be prominent in marketing ereaders to seniors and educating seniors about 
ereaders. Seniors often read before bed and are receptive to features that facilitate that reading 
habit. 
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Abstract 

This poster examines the Facebook presence of two grassroots political movements as two distinct 
information worlds. Upon examining Facebook pages that supported and decried the Tea Party and 
Occupy Wall Street movements, it was found that while there were some clear differences in ideology 
and information values, there was also a considerable amount of overlap in rhetorical approach and 
“positioning.” Situating this behavior within Erdelez’ (Burnett & Erdelez, 2010) idea of “exploded 
context,” which conceptualizes context as always multiple, this research also offers the opportunity to 
explore barriers between the groups, and whether these are surmountable or an innate part of the groups’ 
information worlds. 
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1 Introduction: Exploded Context 


Speaking at a 2009 “Fishbowl” session about the role of context in information behavior at the ASIS&T 
Conference held in Vancouver, B.C., Sanda Erdelez (Burnett & Erdelez, 2010) observed that, in a world 
mediated by networked technologies, “context” must always be understood to be multiple rather than 
singular. Further, information — and the ways in which it is encountered, used, understood, and valued — 
is thus also always a function of the multiple contexts in which it is found and by the ways in which it is 
mediated. To put it simply, a bit of information, even if it may originate in a particular context where it 
has particular significance, ends up being housed somewhere else such as an online site where it becomes 
accessible to a widely distributed audience of individuals, each of whom is situated in his or her own — and 
often incommensurate — context. In social media settings such as Facebook, where information exchange is 
inextricably intertwined with social interactions between often widely divergent groups of people, such 
contextual multiplicity becomes a defining aspect of information: it is always cut loose from its original 
context and situated in a socio-technical environment that is marked by exploding and intersecting contexts. 

This poster presents one part of a case study of political communication and information sharing — 
by American Tea Party and Occupy groups — on Facebook, exploring a series of questions related to this 
sense of intrinsically multiple contexts and the impact such multiplicity has on the value and meaning of 
information. The fundamental question asked here is: What happens to “meaning” when context is 


intrinsically multiple? Several other related questions further inform our investigation: 


e What is the relationship between intersecting/exploding contexts and representations of political 
events? 

e Is the value of political information in such contexts in line with or separate from the factual 
accuracy in the representations of political events? 

e Can information be meaningful and of value to a group if it is demonstrably factually inaccurate? 


To conduct this study, a set of openly and publicly accessible Facebook pages — both pro and con — devoted 
to the Tea Party and Occupy movements were chosen; choices were limited to pages that were explicitly 
categorized as community or non-profit organizations or as media outlets, in such a way that they were 
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clearly distinguishable from private or individual pages. Each page was sampled once a week over the 
course of two months, analyzed inductively to allow interpretations and conclusions to emerge from the 
data itself rather than from pre-determined concepts. 


2 Literature Review: Political Information in Online Settings 


Recently, a considerable amount of scholarship has addressed online political activity, particularly as it 
relates to social media. Kushin (2009) examines Facebook as a tool for political discussion, finding that 
most conversations take place between those with similar viewpoints, but that conversation with those who 
have differing viewpoints was civil in the majority of cases. Lotan et al. (2011) examined the level of 
involvement in the Tunisian and Egyptian uprisings by different groups of Twitter users, and found that 
news content was being co-constructed by activists and bloggers alongside those working as journalists. 
Groshek and Al-Rawi (2013) studied 1.42 million Facebook and Twitter messages during the 2012 U.S. 
Presidential Election, and found that the framing of candidates by opponents’ supporters in overly critical 
ways. 

Fernandes, Giurcanu, and Bowers (2010) studied student Facebook groups supporting the two 
major party candidates in the 2008 elections. They found that students used the site, that the two groups 
displayed different levels of site activity, and that, while they focused on different issues, both used it to 
facilitate discussion and foster political involvement. Woolley, Limperos, and Oliver (2010) also found higher 
levels of activity and Facebook group membership among Obama supporters, noting that racial, religious, 
profane, and age-related language also varied in relation to how each candidate was portrayed. Skinner 
(2011) situated the Arab Spring and Occupy movements within three of the paradigms outlined in Raber 
(2003), with a focus on media use. 


3 Ideological Positioning, Accuracy, and Competing Information Values 


Much of the information exchange that takes place on Facebook pages sponsored by groups aligning 
themselves either with or against the Tea Party and Occupy groups on Facebook occurs through the posting 
of “memes,” which are, most often, relatively simple captioned images referencing a recent event or issue. 
Such memes are not only mechanisms for spreading information, but are also, almost always, explicitly 
ideological, staking out clear positions on the events or issues represented. In many cases, conversations — 
sometimes brief, but sometimes very lengthy — follow these posts, in the form of “comments” posted by 
people who “like” the pages. Such conversations can range from consistent expressions of support for the 
positions articulated in the memes to often vehement arguments between supporters and detractors; 
comment threads may be either moderated or not, with moderators sometimes deleting comments that may 
be deemed to be somehow offensive, often because they express — sometimes very bluntly — disagreement 
with or criticism of the positions advocated. 

One frequent way of staking out a position is to bluntly criticize the opposing position, and those 
memes that clearly do so are highly valued across groups. This is often seen with overt “othering” behavior, 


portraying opponents as monsters or hateful, as in the following meme from a pro-Occupy group: 
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The Other 98% v Liked | | Message ¥# v 


311,375 


ikes - 326,186 talking about this 


Figure 1 


Such an act establishes, concisely and in an easily-digestible fashion, a context that relies heavily upon 
creating distance between opposing views, visually situating the context of the opposing viewpoint 
somewhere “over there,” and alien to the followers of the page. 

Ideological positioning in memes also relies upon conveying the popularity and centrality of a group; 
Tea Party and Occupy groups, accordingly, often rhetorically situate themselves explicitly as the true 
representatives of “We the People,” thus implicitly marginalizing opposing groups. For instance, one of the 
most striking examples on Occupy pages (below) is the use of photos from large-scale protests worldwide, 
usually re-shared from other Occupy pages. These photos are typically not captioned by page admins, and 
the few with captions show statements of support for the protestors. Those that are not taken at Occupy 
protests are the only ones with consistent captions, and these usually include a statement explaining why 
the protest is taking place and how the driving ideals behind those movements are the same as those of the 


Occupy movement. 
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Figure 2 


Perhaps not surprisingly, one of the most common occurences in the use of memes is the invention of fake 


quotes as captions, as in the following from the page of an anti-Tea Party group: 


Eric Cantor: 


“We blocked the 


Violence Against Women Act 
because the Senate forced 

it on us without our consent. 
I'm sure women understand.” 


@TeaPartyCat 


Figure 3 


991 


iConference 2014 Gary Burnett & Julia Skinner 


Such rhetorical moves serve function, to a great extent, as somewhat heavy-handed satire; here, direct irony 
(Republicans cannot oppose violence against women because to do so would be the equivalent of rape) is 
employed in order to suggest that there is a critical contradiction at the heart of conservative politics. In 
such cases, comment threads often erupt in arguments that explicitly circle around the relationship between 
factual accuracy and what could be called the information value of the position expressed. In this case, 
supporters contend that the quote — despite being clearly falsified — accurately reflects a truth about the 
politics of the Republicans in the U.S. House of Representatives who voted against the Violence Against 
Women Act (or, as one commenter put it, bluntly, “Republicans support violence against women”); 
conversely, critics argue that misrepresentation should not and cannot be “true” in any sense, and is always 
a trick used by those who espouse the opposite ideology (as one commenter said, “Wow, how far back does 
somebody have to go to fact check all your sh*t?! This is the problem with you liberals, you don’t fact 
check anything.”) 

What is particularly interesting is that such questions about the relationship between factual 
accuracy and information value play out in precisely the same way on both sides of the political divide, and 
occur regularly on pages related to both the Tea Party and the Occupy movement, and on both pro and 
con pages. For example, the same debate unfolded, at considerable length, on a pro-Tea Party page in the 
comments about the following image, of an ex-marine who positioned himself as a guard, in full uniform, in 
front of a school following the mass shooting at the Sandy Hook Elementary School: 


Figure 4 


In this case, the issue of the value of false representation did not involve any misrepresentation in the meme 
itself, but out in the world itself: here, the debate revolved around the question of whether an ex-marine’s 
choice to present himself as though he was on active duty protecting children was an effective framing of a 
valiant act or, as one commenter put it, made him and his act a “fraud,” because “the information is 


wrong.” 
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4  Problematizing the Wall: Can Barriers be Broken? 


A wall, built of rhetoric and ideological positioning, seems to exist between these two sets of activists, 
evident in the examples above. Further, each of these cases can be seen in terms of exploded context. Each 
refers directly to a specific “real world” context, in which a particular event took place and re-situates that 
event, via a meme, into the context of a Facebook page; further, each reader of that Facebook page is, him 
or herself, positioned in multiple additional contexts: their own “real world” setting and the context of their 
own particular mix of their Facebook friends and the information that they choose to access. Context is, 
in other words, intrinsically “exploded” and multiple in each of these cases, and the value or meaning of 
the information being exchanged is always a function of that multiplicity. Exactly the same information, 
filtered through a Facebook page and referencing exactly the same event, to put it another way, means 
fundamentally different things to different observers, depending on where they are situated in terms of 
ideology and social context(s). And this appears to be the case across Tea Party and Occupy groups 
throughout Facebook, even when those groups use identical strategies for offering up information; the two 
groups, interacting in a distributed context that somehow brings them together, express fundamentally 
incompatible understandings of the political “realities” of the world around them. 

This raises the question: Can the wall separating groups such as these be broken down in the kinds 
of online information worlds we’re examining, or does the exploded and multiple contexts inherent in such 
a setting make them an integral part of the structure of those worlds? Using several pages from each political 
movement, it seems evident that, while cross-group conversations do occur, they are often dominated by 
overt and ongoing attempts by members of each group to discredit the other, often by criticizing rhetorical 
strategies used consistently by both. While the question remains open, this case seems to suggest that the 
contextual multiplicity of technologically mediated social information sharing on sites such as Facebook 
may do as much to build such walls as it does to break them down. 
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Abstract 

Comic books can potentially be a useful tool for education in public health crisis, especially in areas with 
high rates of illiteracy. This research presents the results of a quantitative survey of public health comics, 
from the 1940’s to the present. The data shows how the form and function of these comics has changed. 
The results show that one of the top challenges to public health professionals interested in using comic 
books to solve a health crisis in in locating them, as a significant amount of this material is produced 
and distributed in channels outside of traditional archiving and cataloging. 
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1 Introduction 


Illiteracy is never a good thing, but it can mean life and death in a public health context. Unfortunately, 
many of the world’s most pressing public health problems are situated in areas with high rates of 
illiteracy. (Comic books are a natural fit for this problem. Comics have a long tradition of sharing 
information through images without having to rely too heavily on text. Comics books are also low cost, 
easily portable, and don’t require electricity or other technology, meaning they can be used in virtually any 
environment. 

In fact, comic books have addressing issues in the public health sphere since before World War II 
(Grand Comics Database 2013), however the categorization and organization of comics can make finding 
them difficult. Some comic books are categorized as graphic novels, and thus are treated like traditional 
books. However, what we might think of as traditional comic books have often been immune to the indexing 
and organization that has accompanied most media. Even as new, commercial comic books have moved 
towards digital distribution, the search functionality within them is limited or altogether absent. It is 
difficult enough to be a person with a low-level of literacy who is involved in a public health crisis, it should 
not be difficult for health information professionals to find materials appropriate to the problem. 

There are organized efforts to help public health professionals interested in the use of comic books 
to meet professional goals. The web site Graphic Medicine (http://www.graphicmedicine.org) has been 
running since 2007, and features reviews of dozens of comic books and graphic novels that tackle health 
issues from around the world. Bert Hansen’s paper, Medical History for the Masses: How American Comic 
Books Celebrated Heroes of Medicine in the 1940’s, shows that comic books have been used commonly in 
public health education for over 70 years. The main problem with an organized study of comic books in 
public health stems from their production and cataloging. For example, under the Dewey Decimal System, 
all comic books are files under the number 741.5. 

Commercial graphic novels tend to be categorized like traditional books, at least in libraries. Finding 
public health works that fall into this category is no more or less difficult than finding other books on the 
same subject. Traditional, periodical comic books tend not to be archived by most public or academic 
libraries, and are generally only found in libraries that have special collections, such as Indiana University’s 
Lilly Library. On the surface these might seem like a minor footnote in the world of public health 
information, however organizations like Planned Parenthood have employed comic books in public health 
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campaigns since the 1950’s. Ongoing comic series of the 1940’s, like True Comics and Science Comics, 
regularly featured public health related materials. Many public health comics are very difficult to find using 
traditional research methods, and are often ignored by comic book archivists. For example the online comic 
archive titled The Grand Comics Database has information on over 400,000 comic books, but does not have 
information on the Planned Parenthood comics, which is especially interesting as some public health comics 
have included mainstream superheroes like Spider Man. For example, Marvel Comics and Planned 
Parenthood published a comic book together about birth control in 1976 (Lee 1976). The third venue for 
public health comic books are individual titles produced by both NGO’s and governmental agencies. These 
comics were often produced for a single limited print run and are often not distributed widely enough for 
libraries or other archives to collect. 

Fortunately remnants of many public health comics reside in pockets of the Internet maintained 
by individual hobbyists and writers. Besides the previously mentioned website Graphic Medicine, which is 
run by medical and academic professionals, people have archived old public health comics for both nostalgic 
and comedic intent. This research is based on a survey of 254 public health related comic books and graphic 
novels, from 1940 to the present. 

The first step in assessing the contents of public health comic books was a wide attempt at finding 
as much of this material as possible. The first line of inquiry stemmed from previous academic work in the 
area. Although there is not extensive academic writing on the subject, papers by Green (2010), Williams 
(2011) and Hansen (2004) provided pointers towards historical and influential uses of comics in health. 
McAllister’s 1992 paper on comics in the AIDS crisis also helped identify comics from that era. Interestingly 
one of the best archives for public health related comics was assembled for comedic intent. Ethan Persoff’s 
website Comics With Problems is an archive of comics from around the world, and is frequently cited by 
academic authors on the topic. Internet searches based on these materials lead to other sources, like the 
American Social Security Administration's archive of vintage comics produced to promote social well-being. 

As these comics were located they were categorized in a database, with appropriate metadata, such 
as year published and health topic addressed. Once this database was completed the comics were also 
categorized by the type of intended learning, based on a modified version of Gagne’s learning taxonomy. 
The comics were sorted into four categories: Historical Information, Preventative Instruction, Shaping 
Attitudes, and Personal Narrative & Exploration. The Historical Information category is a modified version 
of Gagne’s category of learning called Verbal Information. Verbal information is largely based on knowledge 
of facts, and the comics in this category are historical overviews of health science breakthroughs. The 
Preventative Instruction category is based on Gagne’s category of learning called Intellectual Skills. In the 
Gagne context, Intellectual Skills is about procedural knowledge, and in this category is made up of comics 
on how to avoid health problems. The Shaping Attitudes category is based on Gagne’s category of 
attitudinal learning, meaning the intent of these comics is to shape mental processes that impact future 
decision-making. Lastly, the category titled Personal Narrative and Exploration is based on Gagne’s 
Cognitive Strategies category of learning, meaning comics that are meant to help people shape their own 
thinking and problem solving. Gagne’s taxonomy has a fifth category, Motor Skills, but none of the comics 
found were intended to teach people how to do exercises or other physical activity. 


2 Conclusion 


The results showed that public health comic books have grown over time, in terms of length, topic, and 
complexity. Of the comics surveyed for this research, only 7 could be understood by people with limited 
literacy skills, a series of comics on birth control from Nigeria produced by Planned Parenthood in 1985. 
Thus nearly 97% of public health comics surveyed require at least some level of literacy to understand. The 
average length of the materials surveyed grew from 12.6 pages for the materials produced in the 1940’s and 
1950’s to and average length of over 106.3 pages for materials produced after the year 2000. The instructional 
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aim of the comics surveyed also changed from more simple objectives to more complex and cerebral goals. 
The early public health comics were overwhelmingly aimed at either historical information and preventative 
instruction. These early public health comics generally were aimed at teaching children about the origins of 
medical and scientific breakthroughs, or aimed at teaching readers procedures to avoid medical maladies. 
As time progressed, more of the comics were aimed at attitudinal goals, such as drug abuse prevention and 
acceptance of AIDS sufferers. In the last 20 years the trend was towards more long form works, and the 
comics found were often reflective pieces that narrate the author’s struggle with physical or psychological 
disease. 

From an information perspective, the biggest challenge in the use of use of public health comics is 
in locating them. While more modern graphic novels might be easy to locate, there is a significant amount 
of material that is not cataloged in an easily accessible way. A significant percentage (29.3%) of the public 
health comics found were located through nontraditional means, and were not subject to traditional 
cataloging methods. 

The results of this research show that there is still room for improvement in the use of comic books 
in the public health sphere. It is clearly a positive sign that comics have been used for a growing range of 
public health concerns, however the potential strengths of comics in reaching illiterate populations has not 
been fully utilized. Comic books in public health in many ways mirror the growth of comic books as a 
media, growing from material that is simplistic and meant for children to material that can have depth and 
is intended for adults. In a public health crisis potentially helpful media material should be readily accessible, 


and unfortunately too much of this material is not archived appropriately. 


3 References 


Gagné, R.M. and Briggs, L.J. (1974). Principles of instructional design (2nd ed.). Holt, Rinehart, and 
Winston. 

The Grand Comics Database (2013) Retrieved August 27, 2013 from http://www.comics.org 

Less, S. (1976) The Amazing Spider Man: The Pull of the Prodigy. Marvel Comics Group, NY. No. 1538 
3-76/1 3.3 

Hansen, B. (2004). Medical history for the masses: how American comic books celebrated heroes of 
medicine in the 1940s. Bulletin of the History of Medicine,78(1), 148-191. 

McAllister, M. P. (1992). Comic books and AIDS. The Journal of Popular Culture, 26(2), 1-24. 

Persoff, E. (2013) Comics with Problems. Retrieved August 25, 2013 from http://www.ep.tc 

Williams, I (2011) Graphic medicine: how comics are revolutionizing the representation of illness. Hektoen 
International: A Journal of Medical Humanities. Volume 3, Issue 4 - December 2011 


997 


Searching for an Agile Approach to Methods and Methodology in the Mobile Arena 


Matthew Pointon! 


1 Northumbria University 


Abstract 

This paper describes an investigation into the synthesis of user information behavior models and 
technological models of mobile computing to provide a new approach to smart/mobile user testing. 
Smartphone take-up has exploded growing faster than any consumer technology in history. This 
technology has altered the way we communicate and has become a key source of information that has 
surpassed email as the core communication mechanism (Naughton, 2012). To design tests for mobile 
applications that are workable and useful to a Smartphone user there needs to be an appreciation of the 
many situations and contexts. Tests need to consider different technological configurations and 
environments, ignoring these factors could have serious implications on use and device interaction. It has 
been noted that many mobile testing practices “lack the realism” (Kjeldskov & Stage, 2004). With a field 
evolving rapidly researchers are developing new test methods and adapting existing ones to support these 
technological advancements. These methods need to be continually challenged to support the mobile 


development community. 
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1 Introduction 


Smartphone take-up has exploded growing faster than any consumer technology in history. This technology 
has altered the way we communicate and has become a key source of information that has surpassed email 
as the core communication mechanism (Naughton, 2012). To design tests for mobile applications that are 
workable and useful to a Smartphone user there needs to be an appreciation of the many situations and 
contexts. Tests need to consider different technological configurations and environments, ignoring these 
factors could have serious implications on use and device interaction. It has been noted that many mobile 
testing practices “lack the realism” (Kjeldskov & Stage, 2004). With a field evolving rapidly researchers are 
developing new test methods and adapting existing ones to support these technological advancements. These 
methods need to be continually challenged to support the mobile development community. The aim of this 
research is to investigate user information behavior models and technological models of mobile computing 
to provide a new approach and model to support user testing. This hybrid user testing model will test 
mobile devices in a naturalistic setting aimed at supporting testing agility. 

The aim of this research is to investigate user information behavior models and technological models 
of mobile computing to provide a new approach and model to support user testing. This hybrid user testing 
model will test mobile devices in a naturalistic setting aimed at supporting testing agility. 


1.1 Background: 


The challenges facing a testers ability to accurately map a mobile users experience has been acknowledged 
by a number of researchers within the Human Computer Interaction (HCI) field, Lindmth, T., 9. Nilsson 
and P. Rasmussen, (2001) being one of the most prominent. They analysed the implementation of mobile 
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tests discussing a range of environmental and configuration factors that “might make the result irrelevant 
since it fails to take the context of its use into consideration” (Lindmth et al, 2001, p. 1). In their paper 
they look at different testing contexts confirming that a setting can be easily arranged and manipulated for 
computers in a lab, which are more or less in the same context as office and home computers. The mobile 
context is substantially different as there are so many influences on the tester and the person using the 
mobile device (Lindmth et al, 2001). Modelling experiences and interactions between complex environments 
and configurations is a challenge and one that Olsen fittingly defines as “Chaos”, these factors can hinder 
the tester and chaos is defined as the limited ability to communicate with each other often hindered by 
data formats, processing capabilities and interaction styles (Olsen, 1998). Testers are evaluating mobile 
users but struggling to “model the properties and viability in such a way that we can begin to solve the 
problem” (Olsen, 1998, p. 4). Mobile testing strategies need to be adaptive and have the ability to model 
or at least simulate realistic contexts, environments and configurations to help solve the problem, test a 
design assumption or a behavioral response. 

Testing strategies tend to be built around models. Models have been introduced, developed and 
refined pushing forward the user-testing discipline as the technologies change with time. For example, 
quantitative models (GOMS, KLM, Fitts or ACT R etc.) and qualitative (heuristics, contextual enquires 
or cognitive walk-through etc.) have been used within a variety of different settings with varying levels of 
success. In one of the earliest papers evaluating mobile interactions on the move Johnson (1998) stated 
that testers are well equipped to model cognitive aspects of users, their tasks and to model aspects of 
collaborations. These models have stood the test of time working extremely well in many lab-based 
configurations, but how do they fare in an increasingly mobile information society? There have also been a 
number of attempts to build models within mobile computing, Olsen stated a need to master the chaos and 
Kristoffersen and Ljungberg (1999) created a Basic reference model of Mobile Informatics. The model aims 
to “reflect the ways in which using IT in mobile setting differs from using IT in stationary settings” 
(Kristoffersen & Ljungberg, 1999, p.13). Their research categorised mobility and put them into three 
components: environment (observable, physical surroundings of the situation), modality (archetypes, called 
wandering, travelling and visiting) and applications (technology, data and program). 


1.2 The need for user behaviour models 


The Reference Model has supported user testing in mobile computing research community by providing a 
testing context. A citation analysis shows that subsequent papers that have used and adapted this reference 
model (e.g. Pirhonen, A., Brewster, S. A. and Holguin, C. 2002; Goodman, J., Brewster, S.A., & Gray, P.D. 
2004; Kaikkonen, A., Kallio, T., Kekäläinen, A., Kankainen, A., & Cankar, A. (2005); Roto et al, 2004; 
Barnard and Yi, 2007; Schmiedl, G Blumenstein, K & Seidl, M. 2011); Sun and May, 2013). On deeper 
analysis these research papers tend to focus on fragmented or small design aspects within the application 
not the overall experience and barriers facing the mobile user. The research methods tended to use mini or 
predefined scenarios such as, testing metaphors (Pirhonen et al, 2002), opening and closing applications 
(Kaikkonen et al, 2005) or map locations and navigation structures (Schmied et al, 2011). Discounting or 
not following up the information processing behaviours and seeking approaches, a key motivator when 
interacting with the mobile device, could impact on the ability to evaluate the application and usability of 
the device. 

Using information needs, as a starting point, will complement and support the context formed by 
the Kristoffersen and Ljungberg reference model. There has been a huge number research studies in the 
information needs field many have presented sound models to help explain user behaviour and information 
seeking approaches (Wilson, 1981 & 1997; Ellis’s, 1989; Kuhlthau’s, 1993; Spinks, 1997; Choo, C., Detlor, 
B. and Turnbull, D, 1999). Many of these models fit with traditional information research supporting the 
diverse information seeking approaches on-line and off-line but as mentioned by McKenzie (2002) they seem 
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to be limited in their ability to describe everyday life information seeking. She goes on to state such models 
tend to focus on active information seeking, to the neglect of less-directed practices (McKenzie, 2002, p. 
20). These neglected practices could be; configuration factors, contextual factors, environmental factors, 
motion or modality, social concepts and psychological concepts of information needs and user behaviour. 
This does not to say all models ignore these factors it depends their research and focus. However, the model 
of information behaviour presented by Wilson (1996) introduced “intervening variables” which took on the 
psychological and environmental factors to a need associated with the information user. Saracevic’ 1997 
stratified model of IT interaction took this further identifying; cognitive, affective and situational but not 
really considering the modality issues associated with new mobile users. Both of these support elements of 
a mobile user and could complement the mobile user tests. 

Reviewing these user behaviour models it became apparent that Wilson’s model provides the 
researcher with a clearer path or adaptive framework to complement Reference Model. Fusing these models 
together creates a hybrid model that aims to bridge the gap between HCI/usability testing and 
user/information needs taking into account mobile use and context. The model builds upon the 
environmental and modality settings by Kristoffersen and Ljungberg with Wilsons intervening variables to 
provider a flow to aid the test. The flow will evolve into “information based” scenarios that will guide tester 
enabling them to plan the configuration factors, contextual factors, environmental factors, motion or 
modality etc. 


2 Conclusion 


The research will implement this hybrid model to assess (by ethnographic means) mobile testing practices. 
This type of research is based around an investigative exploration of the students’ ability to understand the 
hybrid model and how they might evaluate this to support agile approaches within natural mobile testing 
environments. Depicting this type of social phenomenon in a natural setting allows the researcher to follow 
the interpretivist paradigm, specifically empirical interpretivism (Pickard, 2007) which, considers complex 
interactions and acknowledges that researcher and research influence each other direct and indirectly. 
Ingwersen (1984) explains researchers who apply the social perspective to set a context “see information 
users first of all as the members of a particular community, social category or group” (Ingwersen, 1984, 
p.88). The hybrid model identifies the user groups as a social category in this case within a University 
context. This interaction with the model on their course will consequently impact upon the overall research 
output. The success of the research outputs is determined by field tests conducted by the students, observed 
by the researcher. The tests take place in the real/natural contexts and as such tests tend to relax 
experimental controls to produce more naturalistic conditions (McNeil and Chapman, 2005). McNeil and 
Chapman also explain that ‘field experiments appeal to interpretivists because they tend to focus on how 
the real world is interpreted by people who inhabit it’? (McNeil and Chapman, 2005, p. 77). Interpretivists 
take the view that multiple realities based on individual interpretation exist and they advocate interaction 


with research participants to generate outputs in contrast to the positivism tendency to test hypotheses. 


2.1 Contribution to Knowledge 

This hybrid model aims to support the research field in two ways; firstly informed by original synthesis of 
information behavior and mobile computing models, how theoretical modelling can be implemented for 
practical use and secondly demonstrate how information modelling can support critical thinking within 


experimental computing practice. 
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Abstract 

This study analyzed user queries submitted to an academic digital library for four weeks (July 2012 to 
August 2012). We examined users’ query behaviors and compared external and internal users’ query 
patterns for image-based collections. The results of this study identified the most frequently occurring 
queries, the mean of query strings, the term frequency, the most frequently used word pairs and the 
relationship between query terms. Transaction log analysis is useful to examine users’ information seeking 
behavior effectively due to the richness of data. The query analysis of this study shows not only users’ 
information seeking behaviors for image-based collections but also the differences between external and 


internal users’ query patterns clearly. 
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1 Introduction and Literature Review 


Users search in various types of IR systems for their information by formulating queries in a search box. 
Search facilities for different types of Web IR environments may appear similar, but the contents can be 
different with the search box and execute button (Wolfram, 2008). For the examination of users’ information 
seeking patterns, transaction log data are used in many studies. Log data helps researchers identify and 
understand hidden and invisible user visit patterns (Zhang, 2008). Jansen (2006) noted that stored data in 
transaction logs of web search engines, intranets and websites can offer valuable insights about the 
information searching process of online searchers. Many researchers (Jansen, Spink and Saracevic, 2000; 
Spink, Wolfram, Jansen and Saracevic, 2001; Jansen and Spink, 2006; Zhang, Wolfram and Wang, 2009) 
have conducted transaction log query analysis of websites. Wolfram (2008) analyzed query characteristics 
in a bibliographic databank, OPAC, search Engine, and specialized search system such as HealthLink. 
Wang, Berry and Yang (2003) investigated an academic website’s query trends and patterns with 
transaction log data during a four-year period. 

However, there is little investigation about transaction log analysis of users’ query in academic 
digital libraries, in particular image-based digital collections, although there have been studies to analyze 
users’ queries for image retrieval by surveys and interviews (Choi and Rasmussen, 2003). The purpose of 
this study is to identify and understand users’ query searching behavior and their query formulation patterns 
in an academic image-based digital library. It also compares the characteristics of external queries and 
internal queries. 


2 Research Questions 
This study investigates the following research questions: 
a) What characteristics of querying in an academic image-based digital library can be identified? 


b) Are there different characteristics of querying between external and internal users for digital image 
collections? 
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c) What differences, if any, are there between experts and novices in query formulations in an image- 


based digital library in terms of advanced search options such as Boolean operations? 


3 Method 


Transaction log data for four weeks (from July 29, 2012 to August 26, 2012) were made available from the 
University of Wisconsin-Milwaukee libraries digital collections. The collections consist of over 54,000 
photographic images, maps, and special collections. The masked IP address field and the referral field were 
extracted from the raw transaction log file and saved as a plain text file. There are two kinds of search 
query strings; those originating from the digital collection site (internal) and those from outside sources 
such as search engines (external). The internal search interface uses “CISOBOX1” as a field name for the 
primary search function, while the external searches were identified with “q” for the query field. To compare 
the difference between the internal search and the external search, all the lines with the string 
“CISOBOX1=” in the referral field and all lines with the string of “q=” in the referral field were extracted 
into two files. For both files, the entries were sorted in alphabetic order of the masked IP addresses. From 
the sorted data set, all lines with the repeated same referral fields from the same IP addresses were 
eliminated. The repeated lines were generated because often one displayed page has multiple components 
such as images or icons. In this way, all the unique queries were identified. 

To analyze the records at the query and term level, only the query strings were extracted from the 
URL encodings at each referral field. The extracted query strings were further cleaned by removing all the 
non-alphabetic characters (digits and special characters) to make counting simpler. All the remaining 
alphabetic characters were transformed into lower case for more accurate word and query counting. 
Relationships among query term pairs were compared using the network analysis software Pajek 
(http: //vlado.fmf.uni-lj.si/pub/networks/pajek/). 


4 Results 


4.1 Query Analysis 

External and internal query data were analyzed to compare the different characteristics between two groups. 
There were 2,825 external query lines and 11,194 internal query lines. Table 1 shows the 30 most frequent 
query strings for external searches and those for internal searches. The most frequent query in external data 
is the two-word query “family notices” with 24 occurrences (Table 1). 


External Query Strings Frequency Internal Query Strings Frequency 

family notices 24 tsybikoff__g ts 1481 
scenes_in_the_city_posting_the_messages 23 central _ tibet 543 
Subjectarticles 15 wisconsin 182 
kindergarten _ union 13 near_ south side 145 
knit__cast__ on 11 china 138 
frank___ bradley 11 milwaukee 120 

the__ dawn 9 james___groppi___ papers 98 

knitting _ patterns 9 sss 90 
auk kie 9 hong _ kong harrison _ fo 39 

rman 

Baumgarten 9 manila 81 
china___military____ police T california 79 
Gwalior T hong_ kong 72 

vegetation map of asia T turkey 70 
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shanghai _ evening _  post_ and_ mercur 


T east__ side 69 

y 
Vietnam 7 downtown 69 
pearl__ harbour 7 new_ york 66 
Hiroshima 6 united___ states 64 
Refugees 6 people 59 
empire __ theater 6 am 59 
Deegan 6 forman harrison 57 
empire __ theatre 6 nanniwan 56 
Schomberg 6 southeast__ side 55 
ellen white 6 dwellings 53 
Thailand 5 near__ north side 53 
asylum_ hill hotel 5 documents 51 
ellen brown 5 henan 48 
functions of internet 5 west__ side 47 
Morey 5 afghanistan 46 
use___and___misuse___of__internet 5 hong 45 


Table 1: External and Internal Query Strings 


The most frequent internal query is the three-word “tsybikoff g ts” (a Russian explorer named: Gombojab 
Tsybikov (Romanized as Tsybikoff)) with 1,481 occurrences (Tablel). The top query strings are historical 
topic-related. As Jones and others (2000) found, the queries included users’ space error such as 
“subjectarticles” and spelling differences between UK and American systems such as “pearl _ harbour”. 
Figure 1 shows that the mean of external query strings is 2.45 with standard deviation of 1.569. The mean 
of internal query strings is 1.96 with standard deviation of 1.341 (Figure 2). 
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Figure 2: Internal Query Strings 
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4.2. Term Analysis 


For the analysis of individual term used in queries, the most frequent 30 query terms from the external 
queries and the internal queries were identified respectively. Since the purpose of this study is to explore 
term frequency patterns, all observed words were contained in the analysis, including Boolean operator 
“AND” and “OR” as well as stop words (e.g. the, of, and, a, etc.). In the external data, besides the stop 
words, the most frequently used query terms are “Name related” for conducting people searches (e.g. John, 
William, James, Thomas, Henry, Peter, etc.). In the internal data, the most frequently occurring terms are 
“Geographic places related” such as Tibet, Milwaukee, Wisconsin and China rather than stop words. The 
data also revealed that the top individual terms are much more consistent with the top queries among 
internal queries than external queries. 


4.3 Word Pair Analysis 


To analyze the relationship among query terms, all the possible pairs of terms from each query string were 
created by a PHP script. The most frequent word pairs in the external data are “use and of”, “or and or”, 
and “map and of”. In the internal data, the most frequent word pairs are “g and ts”, “tsybikoff and g” and 
“tsybikoff and ts”. Thus, the internal top query pairs are consistent with internal top query strings. To 
visually examine the relationship among query terms, visualization of the relationship among all the terms 
in each data set (external vs. internal) was attempted by Pajek, one of the most popular network analysis 
tools. The data were converted to Pajek format using “txt2pajek.exe” with a weighted option for the pair 
frequency. To avoid too complex displays and to obtain meaningful outputs, only the top 100 word pairs 
from each data set were included (the ties were counted). Figure 3 illustrates the relationship among external 
search terms and Figure 4 shows that among internal ones. 
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Figure 3: The relationship among external search terms 


4.4 Differences between Experts and Novices 


Experts tend to use various search tactics more frequently in their searches than novice users. We could 
observe the differences between experts and novice users by analyzing the search queries using Boolean 
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operators and wildcard characters, although their frequencies are low. Only 26 Boolean search queries were 
observed and only 4 queries using a wildcard character were identified. 
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Figure 4: The relationship among internal search terms 


5 Discussion and Conclusion 


The query analysis in this study shows the differences between external and internal query patterns clearly. 
Internal query strings, query terms and word-pairs show the consistent relationship among each other while 
external ones do not show this consistency. 

User queries between this academic digital library and the search engine Excite (Jansen) show some 
similarities. The average length of queries in both IR systems is short, with a mean of around 2 terms per 
a query. However, there exist differences in the search topic. Queries of this academic digital library include 
historical and geographical related topics while the search topics from search engines range from sexual 
topic to entertainment. 

The limited size of the data collection does not allow broad generalizations. The limitation of this 
study is to analyze query term frequency literally. For example, Hong Kong is two-term place name but in 
the single term analysis, it was divided into two single terms and they were ranked 9th and 10th as the 
most frequently used query terms. Due to a small amount of data, spelling errors were not examined. In a 
larger study in near future with additional data, more sophisticated data examination will be conducted. 

The results of this study are useful to understand users’ searching behavior, especially with their 
query patterns in an image-based academic digital library. There is little research about transaction log 
analysis of digital libraries. Hence, this study will contribute to the better understanding of users’ interaction 
with digital libraries, especially those with image-based collections. 
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Abstract 

Using game mechanics to improve the motivation and efforts becomes a popular approach. Especially in 
higher education many projects have been realized to create a greater engagement of students in learning 
processes. Because of these innovative ideas there is a lack of corresponding evaluation methods which 
respect all relevant aspects. For this we created an evaluation model to meet the needs for evaluating a 
game-based learning approach. 
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1 Introduction 


The use of game elements - also called gamification - in non-gaming contexts becomes more and more 
popular. The idea for example is used to encourage users in social online services like Foursquare or to get 
them involved in health and ecology issues. The main aim is to increase the motivation of users to deal with 
specific subjects or to create user attention (Deterding, Dixon, Khaled, & Nacke, 2011). To illustrate the 
conception of gamification, typical game mechanics are listed and explained in the following: 


e quests 

e (experience) points 
e level systems 

e leaderboards 

e achievements 


According to a computer game, people have quests, a kind of exercises, they have to solve. This can be a 
fight against a monster or the task to classify a book correctly in a library. The central point is that there 
is no punishment if the quest is not solved or if the answer is wrong. Instead of this the user has the 
possibility to retry and test another solution strategy. For fulfilling the quest successfully the user gains 
experience points (XP). Based on the reached XP you can state your abilities and knowledge. Furthermore 
the points are connected directly to the level system. By reaching a pre-defined number of points the user 
progresses to a higher level which is often related to new abilities of the character. The levels and the points 
are not only an indicator for yourself to present your accomplished aims but can also be used for 
leaderboards. In this way on the one hand the user is motivated by solving quests and on the other hand 
there is the competition with the other users. 

Moreover the achievements represent another game mechanic. Achievements are little successes 
that are not linked directly to the quests, for example you have to fulfill a dozen quests or to be online for 
two hours non-stop to gain an achievement (Zichermann & Cunningham, 2011). 


2 Game-Based Learning in Education 

Even in higher education or in the field of e-learning we can find the use of game-mechanics (cf. 
Papastergiou, 2009; Ebner & Holzinger, 2007). In this context the term game-based learning is often used 
and is targeted on the increase of students’ motivation as well as their learning engagement. According to 
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the OECD a high motivation during the learning process improves the learning outcomes (Artelt, Baumert, 
Julius-McElvany, & Peschar, 2003). 

There is a wide variety of realization possibilities. On the one hand you can bring game elements 
to in-class courses by e.g. arranging students in constant learning groups (so-called guilds) to compete. 
There again you can focus on digital game-based learning (Prensky, 2003) and implement a whole online 
game with a virtual world or an online learning platform with partial elements of gamification. In addition 
there are also combinations of both kinds possible. 

A well-known example for game-based learning in higher education is the teaching method of Lee 
Sheldon (2011) who organizes his seminars similar to Multi-Massive-Online-Role-Playing-Games 
(MMORPG). His main aim is to reach students in a way they are familiar with. Like in a computer game 
the students can choose an avatar to fight together with their co-students for better grades in the final 
exams. By doing homework, handing in their tasks early or reviewing texts from other students they can 
gain extra points to improve their grades. In this way the students have many little milestones during the 
semester which can be achieved easily and therefore there are a lot of possible and encouraging senses of 
achievements. The feedback of the participants concerning the course structure shows that there are positive 
effects on the motivation as well as on the learning results (Sheldon, 2011). The approach of Sheldon (2011) 
alludes to online-platforms like World of Warcraft (WoW) or similar games. Another example that does 
not put emphasis on the game mechanics in its entirety is Quest to Learn, a school in New York. This 
public school aligns its complete curriculum with learning by playing. In cooperation with teachers, parents, 
game designers and students they focus on “rule-based learning systems, creating worlds in which players 
(students) actively participate, use strategic thinking to make choices, solve complex problems, seek content 
knowledge, receive constant feedback, and consider the point of view of others” (Quest to Learn, n.d.). 


3 Evaluation Model for Gamified Concepts 


In accord with Fricke (2004) we asked if the existing evaluation models are sufficient to evaluate a game- 
based learning approach or if we need new forms for the evaluation. Because of the innovative and complex 
course designs normal evaluation methods for learning environments and didactics do not cover all relevant 
aspects. For this reason we developed an own model to illustrate the gamification approaches and all 
adjacent areas (Figure 1). Our work serves as a window to an understanding of the evaluating process of 
game-based learning concepts. 

Against this background, the central evaluation aspects of this paper are: 


e the game-based learning course design 
e the participants 

e the acceptance 

e the environment and the time. 


The framework is derived from the model of Schumann and Stock (n. d.) focusing on an information service 
evaluation and supplemented by Knautz, Soubusta, and Stock’s (2010) evaluation aspect for IT system 
quality. Our model includes five different dimensions (Schumann & Stock, n.d.): the gamified concept 
represents only one dimension. According to the project the aspect of the concept can be divided into the 
existing elements e.g. the learning platform, the online game, the practical course with game mechanics or 
everything together. For each part we investigate the objective and perceived quality independently. The 
objective quality is usability, efficiency, effectiveness and functionality. The perceived quality includes 
usefulness, fun, trust, security and user-friendliness (Knautz, Soubusta, & Stock, 2010). Furthermore we 
survey the participants of the concept with their individual learning and information behavior as well as 
their needs. At this point we want to investigate which information demands do they have and how do they 
learn that they meet their needs. They have a reciprocal relationship with their co-participants concerning 
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the competition with them as well as the support by them. As another important aspect we look at the 
acceptance which is closely connected to the dimension of the course concept and the participants. The 
acceptance has a high influence on the learning outcomes which is a central aspect of the evaluation model 
(Quinn, Alem, & Eklund, 1999). It is furthermore based on the adaption of the concept as well as on its 
use. In correlation with the dimension of the users the acceptance creates a network economy effect “success 
breeds success” that refers to a permanent rising number of users as soon as the first users are convinced of 
the concept and share their experiences. 

To go further the model does not examine the course concept uncoupled. For this reason the fourth 
dimension represents the environment. In this context we have a closer look on competition, subject area, 
governance and marketing. The subject area represents the direct setting of the concept. The aspect of the 
governance depicts the demands for the project - for example there are financial, personal or time 
restrictions. Competition and marketing are very close to each other. By offering competing course concept 
the participants can decide which one they would like to choose. In this case marketing is very important 
to convince them of one form. Marketing complies with the purpose to increase the attention for new 
didactic methods. The last dimension represents the time that has a high influence on the evaluation results. 
For example at the beginning of the project the participants do not have a clear imagination of how the 
new didactic concept works and if there are any advantages for them. By witnessing the concept they can 
form an own opinion - based on the experienced advantages and disadvantages. 


a ^ai of Concept 
Quality Participants : 7 i 
the Gamified Concept Use of Concept 
Perceived User-Friendliness 3 è - T 
Perceived Usefulness Learning Behaviour 
Perceived Security / Trust 
Perceived Fun 
Other Factors 
Perceived Quality of 
the Content 
Objective Quality of 
the Gamified Concept 
Efficiency 
Effectiveness 
Functionality 
Usability 
Figure 1: Evaluation Model for Game-Based Learning Approaches 
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Results 
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Network 
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Other Participants ee. Distribution of Concept 


4 Outlook for the Poster 


All presented evaluation aspects were investigated in context of a gamified university course by quantitative 
and qualitative methods like questionnaires, log-file analysis and interviews. The data collection is analyzed 
through statistical methods. In doing so we have a diversified evaluation result - not only based on the 
course concept itself but also on the surrounding areas. 

With the data we can discuss and answer the following questions which will be presented 
exemplarily in the poster: 
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How was the realization of game mechanics in the practical course and on the platform and how 
was the acceptance on the part of the students? 
How was the feedback on the traditional performed lecture? 


Are there any effects on the learning behavior and learning outcomes of the students? 
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Abstract 

Specialist information archives, particularly in subcultural and non-institutional settings, present special 
considerations of local context, cultural appropriateness and use patterns. In this poster, we describe our 
work developing a system of classification, cataloging, and preservation measures in the context of Seomra 
Spraoi’s Forgotten Zine Archive, a Dublin, Ireland-based collection of underrepresented and ephemeral 
cultural materials. The goal of organizing and preserving these materials was achieved without 
compromising the noncommercial, do-it-yourself ethos of the materials and organization. Our actions 
contributed structure and method to a previously undervalued information domain, which may become 
important in the future as the cultural resonance of alternative media and zines in particular becomes 
more widely acknowledged. To develop these systems, keys aspects of the Forgotten Zine Archive were 
examine via a detailed user needs assessment questionnaire and approximately 3? months of weekly 
participant observation sessions. The survey and observational data, viewed through lenses of existing 
theory and archival practices, were used to ground any practical decisions made. The collection of over 
1800 zines was then classified and cataloged based on this knowledge. Issues of preservation and 
digitization were also extensively considered. Once our field work was done at Seomra, we developed a 
set of more generalizable considerations, contributing to existing best practices in the domain of 
subcultural archiving. Through working with the Forgotten Zine Archive, we generated a set of 
conclusions and best practice suggestions, with potential benefit to any group, academic or otherwise, 
that wishes to undertake the maintenance of a similarly ephemeral collection. Catering the cataloging 
process to the audience and setting is vital when dealing with alternate media, and one effective way to 
approach this task has been established through this project. 
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1 Introduction 


As the historical value of fanzines and other independent, subcultural and self-published publications receive 
greater and greater acknowledgment, so too are established memory institutions taking further steps 
towards preserving such ephemeral materials. Wary of this potential institutionalization, however, many 
zine publishers and underground community members have taken it upon themselves to establish rival 
collections of their own. In doing so, these do-it-yourself, independent libraries preserve the original cultural 
ethos of their publications while also striving to avoid perceived compromises or sacrificing of integrity. The 
independent libraries first-hand experience of zines affords the items a special significance and they are 
unlikely to go overlooked as a result. Despite this, however, the nature of independent work may lead to 
uncommon access or preservational issues which would otherwise not be encountered if they were in 
established institutions. 
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This poster builds on Andrew Flinn’s idea of “archival activism” (2011), or DIY archives, by 
utilizing the specific case of the Forgotten Zine archive - an independently maintained ephemera collection 
housed in Dublin, Ireland. We will use this setting as a case study to illustrate some of the potential 
challenges and opportunities inherent in many such unconventional collections, in particular focusing on 
how established institutions should work with these groups rather than absorb their collections. For 
example, independent publications such as zines often do not provide consistent and clear publication 
metadata, which can lead to unique challenges for accurate classification. Additionally, unique constraints 
related to the physical setting of these archives needs to be considered. For example, the Forgotten Zine 
Archive is housed in an autonomous social center, as opposed to a standard library setting, which can put 
the collection at risk during center-wide social events such as regular café hours, film screenings, talks, 


workshops and late-night music gigs. 


2 Background 


Zines (short for “fanzines”) are usually defined as independent publications created by the members of a 
particular subcultural movement or “scene” and have been historically overlooked by the mainstream media. 
Because zines are generally created by figures central to a scene, they are often used as a primary source 
for mapping the development of those subcultures. Cohen and Lashua (2012) demonstrated this in their 
research on Merseysound, the postpunk zine from Liverpool, England. Merseysound is an example of how 
scenes can be influenced by their independent publications. “They provide a record of local punk and 
postpunk scenes, but they were also active agents within those scenes, helping to create the groups, 
identities, and ideologies involved” (Cohen and Lashua, 2012). 

There have been many non-institutional archives that have evolved to cater to the niche audiences 
of a given local social context or scene Archival researchers have suggested this may have resulted from 
fringe cultural groups being overlooked by mainstream collections, as well as a perception of such material 
as being too controversial (Flinn, 2011). Many of those involved also have reason to be suspicious of 
established institutions, having been marginalized for so long, and some hostility may remain. Subcultures 
may also have concerns about potentially leaking insider or incriminating information; as Lingel et al. 


> 


examine in “Practices of information and secrecy in a punk rock subculture,” many subcultures still have 
secrecy veils to keep outsiders outside (2012). Lingel et al. illustrates these veils within the context of a 
particular musical subculture, in which the concerts themselves are illegal (due to non-regulation venues 
and fire hazards) and many of the activities at the concerts can also be considered illicit (such as underage 
drinking and drug use). As a result, the locational information of these events is considered sensitive and if 
any of this information was advertised in a zine, as it historically often was, that information would then 
be public and the secrecy of the group may end up compromised. 

Created in 2004 by prominent Irish zine writer Ciarán Walsh, the Forgotten Zine Archive was 
started to create a functional Irish zine collection. It was made up of around 1,200 zines, donated by four 
separate collectors, and was a mixture of Irish and international zines in. It was stored in a commercial 
warehouse space in Dublin’s North Strand which was being used semi-legally as a residential space by 
members of the Dublin punk, DIY, and independent scenes. The archive was initially opened for a few hours 
every Sunday, but when the warehouse closed in 2005, the archive was forced to look for a new home. This 
home was found in Seomra Spraoi, a non-hierarchical, anti-capitalist non-profit social center which has since 
housed the archive in various locations across the city. 

The housing of the Forgotten Zine Archive in a social center such as Seomra Spraoi, and not in an 
academic setting, presented some unique challenges that had to be addressed. There were tension between 
traditional information organization and the ethos of the Forgotten Zine Archive and Seomra Spraoi and 
this manifested itself in the areas of access, cataloging, and preservation. 
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3 Approach/Methods 


Seomra had a policy of open access to the zines. As the zines are not locked, and there is no security in the 
building, items could have easily been taken from the archive without anyone being aware. Whilst Seomra’s 
culture of openness has to be respected, it was evident that greater protection of the zines would be beneficial 
to the archive. When we began cataloging the archive, the culture of Seomra Spraoi was very important to 
take into consideration. An over-authoritarian cataloging system would not be in line with the nature of 
the establishment the archive was being stored in, nor with the culture of the people who would use it. 
Finally, finding a cost-effective method to preserve ephemera such as zines was a major challenge. As Seomra 
is a noncommercial space run on a not-for-profit basis, there were minimal funds allocated to preserving the 
zines. The Forgotten Zine Archive needed to be preserved more carefully than it had been, but given the 
financial situation Seomra was in, there were no funds available to do so. 

Over five weeks during the summer, the entire Forgotten Zine Archive has been successfully 
classified using the four designated headings (Artistic & Creative, Music, Political & Social and Resources), 
alphabetized, labeled, and cataloged using LibraryThing. A basic, and “user-friendly” approach to 
cataloging was adopted, with a classification system that is intuitive to use. The collection has been re- 
boxed, and each zine has been placed in a polyester sleeve with an acid-free cardboard backing for support 
and protection, both of which are vital for the preservation of ephemeral items. A finding aid was created, 
so that users who are browsing the archive in person may find specific zines easily, particularly those who 
may not have access to the online LibraryThing catalog at that time. The collection has been promoted via 
social media such as blogging, Twitter, and Facebook, giving the Forgotten Zine Archive an online presence 
it never had before. These combined have significantly increased awareness of the archive, and the 
enthusiasm of both potential and current users. Finally, the process of digitizing the collection has been 
examined; the ethical issues surrounding the digitization of ephemera such as zines have led to the 


recommendation that institutions consider carefully the implications of complete online replication of items. 


4 Conclusion 


Issues of access, cataloging methods, preservation, target audiences, and digitization are nothing new to 
information professionals. When dealing with alternative media, however, they take on a different tone. The 
ethos and nature of the zine community is almost one of deliberate disorganization and rebellion, so to try 
to create a collection of these items while keeping this ethos in mind is fraught with challenges. This does 
not delegitimize the process though, as Anderson (1999) states, “[uJnless aggressively pursued, librarians 
would be fortunate to be aware of even 10 percent [sic] of the publishers publishing today. The other 90 
percent [sic] remain obscure.” If we disregard zines and their importance simply because they are difficult 
to categorize or maintain, that 90 per cent may remain obscure for the foreseeable future. The purpose of 
this study was to develop a system for maintaining unique or specialist collections, and to preserve 
underrepresented and ephemeral materials before they are lost. The best practices outlined in our poster - 
ethos-oriented access policies, culturally sensitive user-driven classification, and accessible cataloguing and 
record-keeping practices - contribute to the broader endeavor of developing cultural archives in similar 


nonacademic settings 
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Abstract 

This work looks at the teaching of eastern educated students (largely instructivist) by staff teaching in 
a western educational style (constructivist) if these staff themselves have been educated in an eastern 
educational style. Berry’s four fold model of acculturation strategies is applied to the primary research. 
Conclusions are made that the staff modify their teaching and the materials they are provided with to 
accommodate the two learning styles. This work also confirms that the teachers conform to Berry’s 
‘creative assimilation’, as they display a ‘new’ form of the two learning cultures. 
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1 Introduction 


1.1 Aim: 


Investigate the experiences of teachers in transnational education from the viewpoint of the teacher in the 


host country. 


1.2 Background: 


John Berry (2005 p697) stated “during acculturation, groups of people and their individual members engage 
in intercultural contact” he looks to students from one country (or culture) moving to another. With 
transnational education we are looking at the education pedagogy moving to a host country. So a backwards 
application of the theory — does the constructivist learning theory acculturate to the (new) home country? 


1.3 Research question: 

Does the four fold model of acculturation strategies of Berry, i.e. assimilation, separation, integration and 
marginalisation, map onto the constructivist learning theory as perceived by the instructivist educated 
teacher who is now teaching constructivist based programmes. The research problem under investigation is 
the impact the different learning styles have on the teachers of a western education style, if they have 
themselves an Asian style of education, to students who have had an Asian education to date and how the 
teacher overcomes/meets the needs and demands of the two styles. 


1.4 Literature review: 


The term transnational education or Transnational Higher Education (TNHE), can cover a multitude of 
scenarios and models, it describes a system where a student studies for a (usually) degree in one country 
(host) but the awarding institution is in another (home). There are benefits to be gained on all sides, the 
host institution can teach at degree level so offering a Western education to students from their own country, 
this degree will usually be at less cost to the student than if they travelled and lived in for instance the UK, 
US or Australia. The research in this professional doctorate is carried out in Malaysia, one of the four main 
importing countries of TNHE (Chiang 2012). There is existing research into the impact of the west’s 
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educational approach to the TNHE, but this is directed to the impact of the constructivist education to the 
learner and nothing on the impact of the teacher to their teaching of the approach if they themselves are 
not from a constructivist learning approach. How do these staff approach the west’s style of teaching if they 
have not themselves been educated in that style? 

Dunn and Wallace (2004) found from their student surveys that there was a difference in status 
between local (Singaporean) staff and visiting Australian lecturers, the latter were more ‘expert’ than the 
former in the students opinion. Some materials were Australian designed and delivered, and some local 
(Singapore) designed and delivered — the former were ‘more self-directed than those taught by the partner 
organization’ (p293). Dunn and Wallace (2004) discuss Cheng and Wong’s (1996) work where Confucianism 
influences education in certain parts of the world (including Malaysia), and that in some countries learning 
is more about ‘compiling from the work of masters than comparing or creating new knowledge’ (p295), they 
are comparing instructivist education with that of constructivist. Zhang (2007) details the role of Confucian 
philosophy on Eastern educational and societal systems (see below) 


e Teachers shown respect 

e Learners learn knowledge 

e Encourages learning together 

e Large classes 

e Pressure on student and teachers due to exams and their importance in teaching and learning 
e Exam scores dictate performance of learners AND teachers 

e Teachers use teacher’s guides to deliver uniform content 

e Government policies responsible for textbooks and assessment. 


Adapted from Zhang (2007 p302) 

‘on the spectrum of instructivism versus constructivism, the Eastern learning culture locates nearer 
the extreme of instructivist philosophy than the western learning culture’ (Zhang 2007 p308). Though this 
promotes the acquisition of knowledge there are drawbacks with respect to self direction and critical 
thinking. 

An exam focus can leave the dynamic teacher behind, as if they want to introduce learning 
technologies to, for instance, encourage problem solving or innovation outside of the standard curriculum — 
then the exam grade focus means that this is not really feasible. 

There is a strong emphasis on the use of English as the instructional medium, but this raises some 
debate in the research for instance Chiang (2012 p.183) raise the issue of the ‘pre-packaging and mass- 
production’ of teaching materials without any acceptance of localisation of these materials and the context 
in which they are taught to non native students. This debate has existed for several years and there have 
been discussions and examples of where the localisation of course content has been made (Cheung 2006: 
Chiang 2012: Smith 2010: Wilkins and Huisman 2012). But again the research is focussed on students and 
high levels of analysis and recommendation. What is absent from this type of research is the impact the 
obligation has on the host lecturer through which the materials will be taught. 

Berry (1997 p 6) asks the question “what happens to individuals who have developed in one cultural 
context, when they attempt to live in a new cultural context?”. This question applies to the digital native 
student when they move from their home country to their country of study. It could equally apply also to 
the staff member teaching constructivist teaching materials if they are from an instructivist background. 
The cultural context in question can be seen as not only that of a physical move from one country to 
another, but that of the educational culture from instructivist to constructivist and vice versa. 

Much research on acculturation uses the work of Berry as a basis, for instance his 1997 model. This 


identified a person’s views or degree of acculturation: 
Pp 
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Integrationist 
Separationist 
Assimilationist 


eee IS a 


Marginalised 


Berry (1997) discusses the assimilation of people moving from home to host culture. Assimilation can be 
broken down to two different kinds, creative where the assimilated members display a new form of the two 
cultures (their own home merging with their new host), and relative where resistance to the change that 
the two cultures coming together are found. 


1.5 Methodology 


This research employs Interpretative Phenomenological Analysis (IPA). IPA investigate and researches 
when the ‘everyday flow of lived experience takes on a particular significance for people’ (Smith, Flowers 
and Larkin 2009 p1). IPA looks in detail to the particular case, it is ideographic as it studies a small number 
of people with the same experiences. The experience of each of these individuals is investigated; any 
similarities or differences in the experiences of these people are explored in detail. IPA needs a relatively 
homogenised sample to investigate allowing similarities and disagreement of experience to be investigated 
in detail. Data collection was through interviews with teaching staff that had been educated in the 
instructivist style and were now teaching using British materials, in the constructivist style of education. 
These interviews were recorded and detailed transcripts made. The analysis of this qualitative data is 
through detailed coding of the transcripts to produce themes of recurring patterns (e.g. feelings, ideas and 
thoughts) to produce superordinate themes, each of which contain a series of sub themes. The volunteers 
for this research are from a Malaysian college teaching a British computer Science degree. The staff have 
been known to the researcher for up to ten years as the franchise partnership means that UK staff visit the 
Malaysian college three times per year, this relationship has developed between the researcher and the local 
Malaysian staff, allowing for an ideographic and hermeneutic approach to the data gathering (which are 
essential for) using IPA. 


1.6 Provisional Analysis and discussion: 


By using IPA it was confirmed that all participants were educated in the Confucian, instructivist style, it 
was also found that the home learning theory (i.e. constructivist) was integrated into the host’s predominant 
learning theory (i.e. instructivist). The teaching staff who had been instructivist educated, adapted the 
constructivist teaching materials to accommodate the instructivist educated student. The staff provided the 
students with: additional guidance in the assessments to which they were not familiar; used the incentive 
of ‘this will help you in your marks for the module’ as a carrot to get students to participate in tutorials; 
provided additional text book based learning materials to which the students were directed. This work also 
confirms that the teachers conform to Berry’s creative assimilation, as they display a ‘new’ form of the two 
learning cultures. From open ended interviews, it was determined that the teaching staff had received an 
instructivist education and all had found issues and complications in teaching western education to the 
Asian educated students. Several methods of overcoming these issues were established, for instance the use 
of additional teaching materials via YouTube, difficulties in running small group tutorials and having to 
elaborate the UK set summative assessment. 


2 Conclusion 

This research has shown that teachers who have been educated in an instructivist learning environment, 
are teaching constructivist degree level programmes to instructivist educated students. The work shows 
that the lecturers are adapting their teaching and the constructivist materials to accommodate the needs 
and expectations of the students on the programmes of study. 
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Abstract 

The objective of this study is to understand scholarly research practice in virtual, distributed 
collaborations by focusing on the flow of documents among the participants and to advance design 
guidance for supporting improved document practice across distributed collaborative platforms. To do 
so, we develop a theoretical framework on document practice highlighting the sociotechnical role of 
documents in digital infrastructure. This mixed-methods study will first conduct semi-structured 
interviews to understand document practices. The second phase of the study will collect trace data of 
documents as a way to understand how they change over time. In this poster, we report on the analysis 
of twelve interviews from social scientists working in virtual collaborations. Initial findings show that 
social scientists organize their documents and scholarly work on emergent digital infrastructures. 
Although not ideal, emergent digital infrastructures provide stability for collaboration across time and 
space. 

Keywords: virtual organizing, social science, cyberinfrastructure 

Citation: Sharma, S., Snyder, J., Osterlund, C., Willis, M., Sawyer, S., Brown, M., & Szkolar, D. (2014). Document Practice as 
Insight to Digital Infrastructures of Distributed, Collaborative Social Scientists. In ¿Conference 2014 Proceedings (p. 1021-1024). 
doi:10.9776/14355 

Copyright: Copyright is held by the authors. 

Acknowledgements: This material is based upon work supported by the National Science Foundation under Grant No. OCI- 
12219445. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do 
not necessarily reflect the views of the National Science Foundation. This material is based upon work supported by the National 
Science Foundation under Grant No. OCI-12219445; the Institute for Museum and Library Science, and the iSchool at Syracuse 
University 

Contact: skshar01@syr.edu, jasnyd01@syr.edu, coesterlund@syr.edu, mawillis@syr.edu, ssawyer@syr.edu, mlbrow11@syr.edu, 


doroteaszkolar@syr.edu 


1 Introduction 


Scholarly academic research is becoming more distributed and collaborative as information communication 
technologies make it possible to collaborate, coordinate and organize scholarly work (Cummings and Kiesler, 
2005 and Palmer, Teffeau, and Pirmann, 2009). Collaborative science is conceptualized by the size and 
shape of its collaborative members. There is a relationship between the size of the collaboration and the 
resources necessary to coordinate and organize work across collaborations (Cragin, Palmer, Carlson and 
Witt, 2010). Collaborative science may be small-scale, medium-scale, or large-scale in size. Small-scale 
collaborations are understood as small science (Cragin, et al., 2010). These collaborations are comprised of 
several principal investigators who work together on research grants (Cragin et al., 2010). On the other 
hand, large-collaborations are emphasized as big science or collaborations with hundreds of researchers and 
dispersed resources. Using this framework to understand digital infrastructures highlights the majority of 
work on cyberinfrastructure to support large-scale collaborations. 

The literature in cyberinfrastructure (CI) studies conceptualizes the common components necessary 
to build and maintain a research infrastructure. Ribes and Lee have summarized seven main facets to 
understand cyberinfrastructure in large-scale academic research in the natural sciences and computer 
science. These seven facets serve as important sources to look when understanding digital infrastructures. 
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The first facet of cyberinfrastructure is to observe it as an infrastructure or a structured platform that is 
maintained over time (Ribes and Lee, 2010). A way to analyze infrastructure is to follow people and 
technology and observe areas of strain and areas of success (Ribes and Lee, 2010). This requires looking at 
the technical work and the human work to understand how the infrastructure is being sustained, maintained 
and how it may scale-up (Ribes and Less, 2010). These facets of CI may be useful to understand the digital 
infrastructures of small-scale distributed, collaborative research, specifically, the social use of documents 
and the technical use of documents. 

The purpose of this research is to follow distributed collaborative social scientists to observe their 
digital infrastructures by looking at their documents. Documents provide insight into academic research 
work because they serve as important artifacts throughout the research phase. Previous work shows the 
important the role of documents in academic research. Brady and Berman (2005) describe social science 
cyberinfrastructure as “multidimensional networks that include individuals, data sets, documents, analytic 
tools, and concepts.” Palmer et al. (2009) describes documents as key resources to scholarly academic work. 
To date, very few studies have looked at document practice or documents as a stable artifact to help 
understand infrastructure. In this study, document practice is defined as the action or performance of or 
with a document pertaining to using, sharing, combining, and storing. Traditionally, documents in research 
work are defined as manuals, papers, journal articles etc. (Palmer, 2009). Both document practice and 
documents are understood from the perspective of the participant thus informing ground-up theory 
formation. This research study will try to understand the sociotechnical nature of documents as a way to 


elucidate digital infrastructures in academic research. 


2 Methods 


To date, we have conducted semi-structured interviews with twelve social scientists part of distributed 
collaborative research teams. Our interview protocol also included a structured checklist that captured 
details about digital information and communication tools used by researchers in doing their work online. 
Our sample was purposive consisting of twelve social scientists from the information studies tradition. Our 
sample criteria included researchers who were working on currently funded distributed collaborative research 
projects. Our rationale for selecting information scientists included two main factors. First, we believed that 
if any social scientists were going to be using digital infrastructures for collaboration, these technology- 
oriented scholars would represent “power users.” Second, because many of the information scientists we 
spoke with conduct their own research in the area of virtual collaboration and information technologies, we 
felt they would have a particularly generative ability to reflect on and articulate their practices. Interviews 
were transcribed, inductively coded and analyzed for themes pertaining to document practices and 
arrangements of digital infrastructure (Charmaz, 2006). Results from coding were compared to checklist 
data regarding the types of tools and licensing that are commonly used for distributed collaboration. 


3 Findings 

The interviews provided some interesting insights on participants documenting practices and the technical 
platforms they use to conduct scholarly work. Distributed collaborative research in social science is clearly 
driven by documents. Across all the interviews, documents were the primary means to conduct research 
work. All twelve interviewees mentioned one or more of the following documents as they described their 
collaborative work: Word documents, PowerPoint slides, PDFs, email, and wikis. These were also the most 
reported documents in collaborations. When asked about their documenting practices pertaining to these 
documents, almost all participants described their collaborative work around a combination of editing, 
sharing, using, creating, versioning and/or drafting documents. Genres of these documents spanned from 
paper drafts, field notes, and annotations, to project related files. 
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Second, based on the checklists, when it comes to storing, accessing, exchanging documents, 
research teams relied on commercial technology to conduct most of their document work. Respondents 
described their documenting practices as being associated with certain technologies. For instance, 
participants said that Email is the primary tool for exchanging, sharing, and interacting for almost all 
project work. Email was the most common online tool across all of our respondents for exchanging 
documents. Respondents reported using software and hardware such as Microsoft Office applications, 
browsers such as Firefox or Safari and some common open source programs such as “R,” running on 
Microsoft or Apple computers. To store documents, several participants used DropBox and/or Google Drive. 
Collaboration on documents primarily took place via Word document exchange on email or Google Drive 
through Google documents. Commoditized technologies such as Google Drive, and Dropbox became spaces 
for storing documents, managing documents, and collaborating with documents. Skype was primarily used 
as means to communicate with geographically dislocated members. Several participants noted the 
dependency of project phase to certain documenting practices and technologies associated with those 
practices. In this case, programs were added and dropped throughout the duration of projects on a needs 
base. Some documenting practices were not associated with certain technologies. For example, when 
describing the versioning of documents during writing phases some displayed interesting versioning 
processes. Two respondents described token passing, some also described titling documents, while many 
relied on email as a passing back and forth tool. One respondent talked about the lack of interest in the 
organization to keep up with wikis and technologies used to organize documents. 

When looking at the programs and technologies respondents used, they varied greatly across 
collaborations. Participants described various conglomerations of technology combinations and no two 
groups had similar digital infrastructures. Some participants described how their arrangements were 
compiled while some described their ideal arrangements. For example, one participant concerned with group 
dynamics chose to use wikis because they are “democratic” in flavor and they serve as spaces where various 
documenting practices can take place from storing data to writing/compiling annotations. Others wished 
they had spent more time understanding technologies that could help their document practice. Others 
displayed strong dislike for Google documents. Several participants mentioned that using Google drive is 
not a preferred way of collaborating and organizing online but it is currently the best option to situate and 
coordinate writing research papers with many researchers. All of participants described their lack of project 
management or governing bodies to help with technology arrangements to provide seamless foundation on 
which work could be done. One respondent described the way they choose their suite of technology 
arrangements. In this case, the participant looked for programs that required very little learning so as to 
not take time away from project goals. 


4 Discussion 


The document practice lense provides insight on the types of documents and the ways in which documents 
are used in collaborations. Our respondents described their collaboration on documents and ways in which 
they interact with group members. From this insight, documents provide a base for discussion on the ways 
researchers work online. Document practice identifies some of the basic actions that are used when working 
with documents online. Insights from our interview respondents display some of the areas of where the 
social meet with the technical aspects of documents highlighting digital infrastructures. 

The digital infrastructures that we observed from respondents are very different from the 
characteristics of CI noted in the literature. The distributed collaborative activities of our respondent’s 
occur on cobbled-up commercial technologies. Their infrastructures are comprised of commercial software, 
hardware, and Internet based services to store documents, interact with dislocated colleagues, and write 
paper drafts. Modularity of the technologies gave collaborations stability and flexibility. Social Scientists in 
our study did not discuss the use of computational power to analyze data or reflect on robust data sharing 
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mechanisms to conduct work. What they did say was the need for platforms where work can be shared and 
stored across researchers seamlessly. This entails being able to coordinate collaboration on documents 
especially for paper writing papers. Mo Digital infrastructures are compiled to take documenting practices 
into consideration that may reflect the importance of the types of work that is being conducted in distributed 
collaborative social science. 

Although digital infrastructures do not reflect the model of CI proposed by Atkins, 2003, they do 
provide similar functions that allow collaborations to survive and thrive for the duration of the project. 
They provide stable yet fluid and evolving platforms in which distributed work successfully takes place. 
What is surprising to notice is that these compiled information and communications technology (ICT) do 
not have governance structures or support systems to maintain them; rather they are reflections on the 
skills of the person who compiled them. These findings suggest that social scientists are inventively 
appropriating ICT use into their scientific practice. 


5 Conclusion 


In the next phase of the study, we wonder what technical arrangements provide the most stability for 
document practices? Are there further insights we can get from looking at documents in their digital 
infrastructures? How do technology arrangements look over time? These are some of the questions we look 
to answer in the next phase of the study. In terms of methodology, the next phase of our study will be to 
develop a trace data protocol in order to collect data on how documents flow in research collaborations. 
Our work hopes to forward theoretical understanding of the document practice to further understand its 
application to forming digital infrastructures in scholarly research. 
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Abstract 

This poster presents preliminary work in adapting a ‘curation profiles’ approach to study data practices 
in a corporate enterprise setting. We outline important similarities and differences between the curation 
of basic vs. applied research data and present preliminary findings from a pilot study with design 
engineers at a multi-national corporation that manufactures heavy machinery. We show that 
reproducibility, quality vs. value, and that discovery-driven quality control are key areas for the 
development of new curation services in this sector. We conclude with some future directions for 
extending the curation profiles project to new data-intensive workplace settings. 
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1 Introduction 


Data generated through research and development (R&D) activities are increasingly considered valuable 
assets with high potential for reuse, and important contributors to national economic stability (OSTP, 
2013). Library and information science has studied the curation of data for reuse in many diverse scholarly 
settings (i.e., Treloar et al., 2007; Choudhury, 2008; Cragin et al., 2011), including recent open government 
data initiatives (Ding et al, 2010). However, there have been few studies that address how corporations are 
attempting to preserve and curate their data for future analysis (Curry, Freitas & O'Riain, 2010). Moreover, 
there is little known about the ways academia, government, and the private sector differ or converge on 
basic data curation issues, such as the organization, representation, tracking, reuse, and sharing of data 
amongst their various research stakeholders. 

Curating and Profiling Enterprise Data (hereafter referred to as CPED) is a project investigating 
how both systems and services developed for data curation in academic science departments can be tailored 
to meet the unique demands of corporate R&D settings. This work is aimed most immediately at informing 
the development of new data curation infrastructures and services for corporate partners in CPED, but 
these findings will also provide iSchools a better understanding of the skills needed to curate data in diverse 
R&D contexts and can therefore drive the development of new curation curriculums. 


2 Profiling Data Collections 


CPED’s research methodology draws from the Data Curation Profiles Project (DCPP) which was designed 
to study how data infrastructures and curation services might be tailored to fit the specific needs of a 
research field, laboratory, or discipline in an academic setting (Witt, et al., 2009). Specifically, this project 
attempted to: 


Enrich the understanding of data access and related curation activities through case studies of 
researchers' data practices. 
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Translate and compare needs for archiving and sharing data across campus units and institutions. 
And 


? 


Convert the results into formalized policies to enhance curation and access to data collections in a 
repository setting. (Witt, et al., 2009) 


Working with corporate partners, including a multi-national corporation that designs, engineers, and 
manufactures heavy machinery, CPED is attempting to extend the research agenda of DCPP to the private 
sector. In doing so, CPED will both investigate the research practices of different working groups, corporate 
laboratories, and individuals in private industries, as well as create rich, generalizable cases studies of data 


access and preservation issues for each of these groups. 


3 CPED Pilot Study 


Thus far, we’ve conducted a single pilot study to investigate how we might extend DCPP curation profiling 
approach to study enterprise-level data practices. The first phase of this work has included the customization 
of a curation profile template and an interview guide to be used by the IT departments of CPED partners. 
We have also completed 12 group interviews with data producers and consumers at one of our partner’s 
headquarters. This pool of respondents includes engineers from manufacturing, testing, and design 
departments, as well as marketing and IT support staff. 

The first curation profile being developed from these interviews is focused on engineers that have 
responsibilities in the design and drafting of a machine’s geometry; including the specifications and 
limitations for appropriate machine use conditions, and programming lower-level software controls used by 
these machines. These engineers are unique in that they require access to test data that result from their 
design prototypes being used in simulation and field-experiments, as well as reliable data about materials 
(i.e. properties, prices, stock availability, etc.) from external parts vendors. The reliance on multiple sources 
and types of data therefore makes design engineers an ideal sample for studying data reuse and sharing 
practices. 


4 Enterprise-level data practices 


Many data management issues are persistent in collaborative research, such as the need for standardized 
metadata and consistent file naming conventions, but our initial interviews with design engineers have 
revealed three aspects of our participant’s data practices that are unique to this R&D setting: 


External data are expected to be reproducible. 


Research conducted by design engineers is not focused on a basic understanding of a material, but 
instead on the performance of that material when used in a particular context. This places increased 
importance on reproducing and verifying material data obtained from parts vendors, as well as 
creating internal standards of quality that can be shared across teams using similar data. When 
testing vendor supplied material data it is assumed this information is not accurate and can only 
be accepted on a “reputation basis” after many rounds of tests that verify the manufacturers 
reported values. 


Quality vs. value is an imprecise, and often flawed distinction 


Some design engineers would trace back and obtain legacy data they had produced to verify the 
quality of a new dataset they received from field-test engineers, others only valued data coming 
directly from the testing of their latest prototypes- regardless of known quality issues. Value and 
quality were not well bounded, nor well defined concepts to our participants, and they dismissed 
any discussion of their own research data along such lines. This is highly unique, as most 
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participants in an academic research setting report quality or reliability to be the most important 
criteria for determining the value of a dataset for reuse (Weber et al., 2012). 


Discovery is a mechanism for quality control 


In many instances, datasets that were continually reused became de-facto standards for design, and 
were archived in many different systems and shared workspaces. These datasets persisted, not 
because of their high quality or uniqueness, but simply because they were easy to access, most team 
members knew of their existence by a generic name, and the errors that the data did contain were 
likely to have known “work-arounds”. Reuse of data with known problems is not unique to 
enterprise-level data practices (Zimmerman, 2007), but a dataset being archived in multiple 
locations so that it would become “known” to collaborators diverges from many previous studies of 
data practices (e.g. Cragin et al. ‘2010). 


5 Next Steps 


We will continue to code transcripts from these interviews, and our poster will present further analysis of 
the themes mentioned above. For the purposes of visualization, we have also developed a set of schematics 
that map how data move through the design engineer’s daily work routines. These maps allow us to trace 
the data between points of production and consumption, and will be used in the poster presentation to 
show useful points of intervention for data curation. We believe there is much to be learned from settings 
where applied R&D research is taking place, and our future work will include a more comprehensive analysis 
of the differences between basic and applied research data practices. 
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Abstract 

This work-in-progress poster reports on the preliminary findings regarding college students’ value 
structure of how to choose and utilize mobile health/wellness applications. We have conducted surveys 
and follow-up interviews with college students who have been using mobile health/wellness applications. 
In this poster, we analyzed the survey data from sixteen participants and the interview data from five 
participants (three females and two males). The analysis showed that the most important purposes of 
using mobile health/wellness applications for college students were recording and managing personal 
health information/records and keeping up with their fitness plans. For selection criteria, easy to 
navigate, easy to use, quality of content, customizability, and ratings from other users seemed to play 
the most important role in college students’ choices of certain mobile applications among alternatives. 
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1 Introduction 


Mobile devices have been gaining popularity among people. For instance, 56% of American adults who are 
18 years old or older own a smartphone as of May 2013; particularly, 79% of younger adults ages 18 — 24 
years old have a smartphone (Smith, 2013). Accordingly, mobile applications on their smartphones have 
become useful channels of tailored health/wellness information distribution as well as tools for monitoring, 
recording, quantifying, and managing the user’s health/wellness activities. 19% of smartphone owners used 
health-related mobile application(s) as of September 2012 (Fox & Duggan, 2012). 

Mobile application stores (e.g., iTunes, GooglePlay) list hundreds of thousands of mobile 
applications. However, it is not always clear whether those applications are grounded in credible sources, 
such as medical and kinesiology research. In addition, there is little research examining consumers’ value 
structures of how to search and select certain mobile applications among alternatives. More research is 
needed to better understand mechanisms of how consumers perceive the usefulness and quality of mobile 
health/wellness applications. Ultimately, identifying the structure of consumer decision-making in selecting 
a health/wellness application would inform the design of mobile health and wellness applications and 
ranking algorithms for search engines and online stores and align them better with the consumer’s perception 
of usefulness and quality. 


2 Related Research 


Although there is little research on the use of information sources that may influence consumer decision- 
making when selecting mobile applications, there is prior research on quality, credibility, and consumer 
opinion/sentiment analysis. Quality is generally defined as “fitness for use.” The quality of information 
products and services can be evaluated either directly through a systematic evaluation and use or indirectly 
by using different cues and heuristics (Stvilia, Mon, & Yi, 2009; Sundar, Knobloch-Westerwick, & Hastall, 
2007; Wilson, 1983; Winker et al., 2000). 
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For mobile applications, the small keypads, displays, and limited processing power of mobile devices 
pose new usability challenges to the designers. Venkatesh, Ramesh, and Massey (2003) found that Web 
usability guidelines in general might not be directly applicable to mobile Web. Also, Kjeldskov and Stage 
(2004) showed that usability issues regarding mobile systems can be derived from users using mobile devices 
in motion and hence experiencing a higher physical and cognitive workload than the users of stationary 
devices. 

The software quality literature can guide this research as well. ISO (2011) defines software quality 
as a concept which comprises the following characteristics or criteria: functional suitability, reliability, 
performance efficiency, operability, security, compatibility, maintainability, and portability. In reality, 
however, most of the mobile application users do not have access or an ability to evaluate the source code 
of the application. Thus, they cannot help evaluating these criteria either through the use of the application 
or indirectly by using quality markers and social metadata such as other user’s evaluations and quality 
incident reports. The literature provides several typologies of software quality incidents and quality 
problems (e.g., Fenton & Pfleeger, 1991; Fenton & Neil, 1999). 

More research, however, is needed to examine what consumer expectations of and priorities for 
mobile health/wellness application quality and quality cues are. Moreover, it is necessary to develop an 
integrated model(s) and knowledge base for the indirect evaluation of mobile health/wellness applications 
by consumers. 


3 Research Questions 


This study addresses the set of research questions below: 


e RQl1: What kinds of mobile health/wellness applications do student use? 

e RQ2: What are the purposes and features of those mobile applications? 

e RQ3: How do students search for mobile health/wellness applications? 

e RQ4: What are the metadata, social cues, and strategies that students use to select a mobile 
health/wellness application among the alternatives? 


4 Methods 


4.1 — Instruments 


The information quality (IQ) criteria developed by Stvilia et al. (2009), the typology of software quality 
(SQ) proposed by Fenton and Pfleeger (1991), and a review of the related literature guided the construction 
of a survey instrument and interview protocol. The initial version of the instrument was pilot tested with 
eight doctoral students in July 2012 and revised based on their comments. 


4.2 Data 


A purposive sampling approach was used to recruit appropriate subjects for the research (i.e., college 
students using mobile health/wellness applications). Participants were recruited through the Facebook page 
of the University’s student fitness and wellness center. Each participant completed a survey and a follow- 
up interview in one-on-one, face-to-face meetings. The data collection is still underway. As of September 
15 2013, sixteen participants have been recruited. This poster presents a preliminary analysis of survey 
data of all sixteen participants and related interview data from randomly selected five participants (three 
females and two males). 


5 Preliminary Findings 
62.5% of the participants (10 out 16) were female students. For ethnicity, the majority of the participants 
were White Caucasians (13 out of 16; 81.3%), two were Hispanic or Latino (12.5%), and one defined himself 
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as multiracial (6.3%). In terms of education level (status), 75% (12 out 16) were undergraduates, 12.5% (3 
out of 16) were graduate students, and one was pursuing a non-degree certificate in the college. 

The most frequently mentioned mobile applications by the participants were MyFitnessPal and 
Lose It! (6 out of 16). Other applications mentioned by more than one person included: Nike+ Running, 
Runtastic Pro, C25K Free, MapMyRun, and Fitbit. 

87.5% of participants indicated that they use mobile heatlh/wellness applications to record and 
manage personal health information/data/records. 81.3% used these applications for keeping up with a 
fitness plan, and 43.8% (7 out of 16) used them for designing a fitness plan. 

Regarding RQ3, which is asking about how students search for mobile wellness/health applications, 
several participants mentioned in the follow-up interviews that they learned about the applications from 
health/wellness-related articles on the Web or magazines; other participants said that they directly went 
to application stores (e.g., iTunes or GooglePlay) and searched for applications by using terms representing 
the functionalities they were looking for, such as calorie counter, nutrition facts, and running: 


“I read about it in a magazine ... it wasn’t an advertisement per se. Just saying, ideas of good to 
use to check your calorie” (p04). 


“T think I searched something similar to nutrition facts in the app store, and this was one of them 
that showed up” (p05). 


Criteria Mean SD 

Easy to navigate 6.63 0.62 
Easy to use 6.44 0.89 
Provides high quality content 6.31 0.87 
Allows personalization 6.19 0.83 
Has high ratings from users 6.13 0.89 
Free (No charge) 6.00 1.26 
Includes little ads 5.81 1.47 
Ranked high by a search engines or mobile apps stores 5.50 1.32 
Looks professionally designed 5.31 1.54 
Is linked to by a site you think is believable 5.31 0.89 
Provides additional health/wellness information and tips 5.00 1.32 
Recommended by friend(s) 4.88 1.67 
Have a good experience with the related website 4.81 1.11 
Represents /produced by an organization you respect 4.69 1.14 
Recommended by social media 4.56 1.26 
Includes a clear privacy policy 4.50 1.55 
Recommended by a newspaper /magazine 4.44 1.36 
Recommended by a doctor 4.38 1.63 
Includes sources, author credentials, and affiliations for content 4.00 1.32 
Has a third party quality approval/review seal 3.94 1.00 
Is advertised on the radio or TV 3.81 1.60 
Displays an award it has won 3.69 1.20 
Represents/produced by a non-profit organization 3.38 1.36 


Table 1: College Students’ Perceptions on Criteria for Choosing Mobile Health/Wellness Applications 
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6 Discussion and Future Research 


While it is premature to draw any generalizable conclusions with the small sample size (N = 16), this 
research provides some implications on college students’ use of mobile health/wellness applications and their 
value structure of how to choose the applications. Based on the IQ criteria (Stvilia et al., 2009), the 
participants’ application choices tended to rely on usefulness (ease of use, utility, objectivity). Particularly, 
ease of use and utility appeared to have a significant impact on their perceptions of application quality; as 
mentioned above, Easy to navigate and Easy to use were the top 2 (i.e., most important) criteria, which 
are directly related to one of the IQ criteria, ease of use. 

In addition, multi-purpose applications (e.g., MyFitnessPal and Lose It!) were more popular (i.e., 
more frequently used) than single-purpose applications (e.g., Runtastic Pro, C25K Free, etc.); this aspect 
is related to the IQ criterion, utility. Many participants seemed to enjoy various functionalities in a single 
application, such as monitoring various types of exercises (e.g., running, weight lifting, cycling, etc.), keeping 
track of personal health/wellness-related information/records (e.g., weight, height, caloric intake/loss, 
calorie information of foods, etc.), setting goal calories for the day, and calculating calories to be burned to 
meet the goal. 

The immediate future research plans include collection of additional data and the construction of 
the model of mobile health and wellness application selection and user by consumers. 
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Abstract 

Valuable Initiatives in Early Learning that Work Successfully (Project VIEWS2), is an Institution for 
Museum and Library Services (IMLS) National Leadership Research Grant with the objective of 
providing evidence-based methods for planning and evaluating the outcomes of public library early 
literacy programs. This study, unusual within Library and Information Science (LIS) research, consisted 
of a two-year experimental design with an online intervention. Forty libraries throughout the U.S. State 
of Washington were randomly assigned in the two- condition study: control (20) and treatment (20). The 
focus of this poster is to look at how the design of an intervention, administered during Year Two of the 
study broke down walls among the experimental librarians through the use of Information and 
Communication Technologies (ICTs). The existence of an ongoing community of practice across 
geographic boundaries will be verified by post-intervention surveys and in-depth phone interviews. 
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1 Introduction 


Valuable Initiatives in Early Learning that Work Successfully (Project VIEWS2), is a study funded by an 
Institution for Museum and Library Services (IMLS) National Leadership Research Grant. The Project 
VIEWS2 study, unusual within Library and Information Science (LIS) research, consisted of a two-year 
experimental design with an online intervention. The overall objective of the research was to provide 
evidence-based methods for planning and evaluating the outcomes of public library early literacy programs. 
Forty randomly assigned libraries (13 large, 13 medium, and 14 small) throughout the U.S. State of 
Washington were included in the two-condition study: control (20) and treatment (20). The focus of this 
poster, rather than on the overall objective of the research, is to look at how the design of an intervention, 
administered to the experimental librarians during Year Two of the study, broke down walls among the 
librarians widely separated geographically, through the use of Information and Communication Technologies 
(ICTs). Methodologically, two pre-intervention surveys provided knowledge of the librarians’ context and 
led to the design of a connected learning experience. The existence of an ongoing community of practice 
across geographic boundaries will be verified by post-intervention surveys and in-depth phone interviews 
prior to submission of the final poster draft. Findings of this research can be applied to designing other ICT 


educational situations. 
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2 Theoretical Basis for Intervention Design 


EL-Capstone!, a research-based survey instrument administered before the intervention, revealed to us that 
though librarians indicated a reasonable comfort level with the early literacy core knowledge to be presented 
in the intervention, they did not feel comfortable implementing the knowledge in their practice. We wanted 
to create an intervention that would build confidence through peer support and shared learning 
opportunities, thereby breaking down the geographical barriers among the participating librarians. 

Therefore, a social constructivist approach to communities of practice and the principles of 
connected learning incorporated in webinars were chosen as appropriate theoretical bases for an intervention 
designed for librarians who were too widely distributed to meet in person. 

Communities of practice have long been considered a successful mode for adult learning and as 
method for building confidence. As noted by Yukawa, “communities of practice stress that learning is not 
merely knowledge acquisition but more fundamentally a process of identity formation and empowerment 
through participation in learning communities” (2010, para 7). While much has been written on 
communities of practice as a whole, we chose to explore this particular community of youth services 
librarians, in the context of using planning and evaluation tools developed by the research team, aimed to 
measure and inform early literacy learning in storytimes. 

Connected learning is a recently introduced concept, originating from the work of Mizuko Ito and 
others with the learning of youth in a networked society (Ito et al, 2013). However, several of the principles 
of connected learning as readily apply to adult learners in an ICT learning situation. In our design of the 
meeting, we chose to incorporate two aspects of the Connected Learning principles and values: “Peer- 
supported,” i.e., contributing, sharing and giving feedback in inclusive social experiences that are fluid and 
highly engaging (para 17) and “Social Connection,” i.e., learning becomes meaningful through relationships 
(para 15), believing that these would help facilitate a community of practice and a cohort of experts. 


3 Context Informs Design 


Two surveys of the librarians gave us needed information about the context of the intervention. In addition 
to El-Capstone, we conducted a survey to assess each librarian’s current technological capabilities and work 
obligations and provided assistance in preparing their environments. As a result, we created an intervention 
design that offered maximum flexibility and consistency with respect to meeting dates and times, as well 
as provided hardware and software support wherever possible to facilitate community and participation. 


1 Capps, J. (2011). EL-Capstone Scale Development: An Emergent Literacy Concept Inventory. Electronic Theses, Treatises and 
Dissertations. Retrieved from http://diginole.lib.fsu.edu/etd/4241 
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Webinars were offered at 3 key time frames 
A= Monday 9 am 

B = Tuesday 2 pm 

C = Friday 10:30 am 


Figure 1: Intervention Schedule 


The final design consisted of two initial training webinars, offered at specific times, that were intended to 
be interactive, collaborative spaces between trainers and trainees. Periods of independent discovery and 
exploration in-between webinars enabled librarians to incorporate the learned strategies into their storytimes 
and build their own levels of expertise. Additionally, we provided seven weekly emails to the participating 
librarians with storytime tips and related indicators for the age ranges; these were also posted on the website 
along with recordings of the webinars for their review. A third and final webinar gave the librarians an 
opportunity to come together once more and share their experiences as they continued to grow in their 
professional practice. 


14 


Y/ Smal Libraries 


13 


Medium Libraries 


Figure 2: Participating Libraries in Washington State 


At the conclusion of the training, the librarians were asked to complete EL-Capstone once again as well as 
a follow up survey in which they were asked about: 


e the usefulness and ease of the Webinars 

e the usefulness and ease of the weekly emails 

e the training website 

e what they took away from the training 

e creating a sense of community among the librarians 
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4 Data Analysis 


The second administration of EL-Capstone showed that the treatment-condition librarians gained 
confidence in their ability to incorporate their early literacy knowledge into storytimes while the control 
group did not. In general, the responses of the experimental librarians to the survey questions were positive 
and will be shared; suggestions for improvement, such as providing agendas ahead of time, will also be 
shared. In reviewing the results of the survey, we found several common themes that demonstrated a shared 
positive experience through the webinars and a varied perspective at the same time. Word clouds 
constructed from the post-intervention surveys enabled us to convey the responses in a meaningful way to 


the librarians. 
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Figure 3: Visual Cloud of recurring themes relevant to usefulness of weekly emails 
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Figure 4: Visual Cloud recurring themes relevant to the usefulness of webinars 


5 Moving Forward 

Telephone interviews will be conducted with librarians to gauge long-term effects of the connected-learning 
experience. Full results will be available soon. Our intent is to learn more in-depth information regarding 
peer learning, continued collaboration, whether a sense of community has been gained through the study, 
whether geographical barriers have been broken, and how to sustain a community beyond the scope of the 


study. 


6 Significance of the Study 


The component of this research study surrounding the design of an intervention using ICTs provides both 
a theoretical and methodological model for breaking down geographic barriers when participants in 
educational experiences are too widely dispersed to meet in person. The pre- and post-test experimental 
design is largely unique in LIS research literature. The study assesses a long-term theoretical principle, i.e., 
communities of practice, in a virtual environment as well as applying principles from a newer concept, 
connected learning. It reinforces the idea that it is not only knowledge but also context that are important 


in the design of education using ICTs. 


7 References 

Capps, J. (2011). EL-Capstone Scale Development: An Emergent Literacy Concept Inventory. Electronic 
Theses, Treatises and Dissertations. Retrieved from http://diginole.lib.fsu.edu/etd/4241 

Feldman, E. N. (2011). Exploring the Impact of a Platform for Professional Development: Validating and 
Examining a Curricular Planning and Assessment Framework Using Mixed Methods. Retrieved 
from http://gradworks.umi.com/3485597.pdf 


1036 


iConference 2014 J. Elizabeth Mills et al. 


Ito, Mizuko, Kris Gutiérrez, Sonia Livingstone, Bill Penuel, Jean Rhodes, Katie Salen, Juliet Schor, Julian 
Sefton-Green, S. Craig Watkins. (2013). Connected Learning: An Agenda for Research and 
Design. Irvine, CA: Digital Media and Learning Research Hub. 

Yukawa, J. (2010). Communities of Practice for Blended Learning: Toward an Integrated Model for LIS 
Education. Journal Of Education For Library & Information Science, 51(2), 54-75. 


8 Table of Figures 


Figures Intervention SCHedtle iva. cscse esas ceive aein saad EA sa oad tava sttae TA Hawa Genesio pts ae aa 1034 
Figure 2: Participating Libraries in Washington State 00.0... ccecsccecessessnnececececessensaeeeeseseeeseaaaeeeeeerens 1034 
Figure 3: Visual Cloud of recurring themes relevant to usefulness of weekly emails... eee 1035 
Figure 4: Visual Cloud recurring themes relevant to the usefulness of webinars...............ceeeseeeeeeeeees 1036 


1037 


Identifying Description Indicators for Research Data from Scientific Journal 
Publications 


Tiffany C. Chao! 
1 Center for Informatics Research in Science and Scholarship, Graduate School of Library and Information 
Science, University of Illinois at Urbana-Champaign 


Abstract 

In order to support the sharing and reuse of scientific research data, rich description about the data must 
be made available. Scientific journal publications are a potential resource in contributing contextual 
details about the collection, generation, use, and analysis of data critical for facilitating meaningful 
interpretation. This poster presents an exploratory study on what information related to data can be 
identified from published literature on soil science research. The preliminary findings reveal the range of 
information detailed about data within journal publication including discussion of data sources, referenced 
techniques and processes applied to data, and description on how data variables were collected and 
derived. With the growth of digital data, these findings will contribute to the development of a systematic 
approach for enhancing description in data curation systems and services and fostering data reuse. 
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1 Introduction 


The provision of description and metadata are essential for the discovery, sharing, and reuse of research 
data. However, obtaining such information from those involved in producing the data is a time- and 
resource-intensive process. Descriptions are beneficial for accounting what data have been collected and are 
available, but can also provide insight on how and why data were created, and explain anomalies or areas 
of uncertainty that arose during the research process. The current emergence of scientific workflow system 
adoption demonstrates an automated alternative to manually documenting data production throughout the 
research lifecycle (Littauer et al., 2012). Other systems, such as the UniProt (http://www.uniprot.org/) 
database for protein data, also curate annotations both automatically and manually generated which 
contribute to a more robust provenance record for the data. However, the use of scientific workflows or 
automated tools for documentation is still not widespread across scientific domains including small science 
research (Davis et al., 2012). Small science research studies garner a significant portion of scientific funding 
in the US yet the ad hoc documentation and use of metadata standards make the data generated from these 
studies difficult to readily access or reuse by others (Heidorn, 2008; Wallis, Rolando, & Borgman, 2013). 
Similarly, survey findings reported by Tenopir et al. (2011) suggest scientists generally are not active in 
applying metadata to describe their datasets with only some who utilize locally developed standards. These 
contrasts and variations in metadata use and documentation practice for data further exacerbates the 
challenge of securing description information to foster future use of the data. With increased attention to 
the development of infrastructure and services for the curation and long-term management of research data 
in libraries, archives, and repositories, identifying an approach to procure description information for 
available data is needed. 

Data are a key part of the foundation underlying scholarly journal publications and increasingly 
becoming accessible as supplements to published articles (Borgman, 2012) or embedded as part of online 
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journal publications allowing for user interaction and annotation (Attwood et al., 2010; Renear & Palmer, 
2009). Scientific journals publications remain a primary mechanism of communication among scientists and 
scholars, advancing scientific knowledge and innovation and providing meaningful information units for 
further discussion and analysis (Brown, 2010). The descriptive content and embedded data representations 
(e.g. figures, tables, charts, etc.) of journal articles also play a vital role for researchers to verify the 
reliability of data for reuse (Faniel & Jacobsen, 2010) or as information sources to discover data for new 
inquiry (Davis et al., 2012). Given the prominent role of journal articles within the scientific community 
for communicating scholarly information and as a resource for data discovery and study, there is potential 
for publications to be used as a source for informing data description for curation. This study investigates 
what indicators related to data can be identified within the content of journal publications to support 
continued curation of research data. 


2 Method 


In this exploratory study, nine full-text articles were collected from three peer-reviewed journals in the soil 
sciences: Soil Science Society of America Journal, Applied Soil Ecology, and European Journal of Soil 
Biology. The selected journals are considered top tier in the field based on published rankings from Scimago 
(http://www.scimagojr.com/) and Thomson Reuters Journal Citation Reports. Soil science is investigated 
as it is representative of small science research where data generated are in high need of curation support 
and primarily analyzed and used locally within a research group (Cragin et al., 2010). In addition, the 
rigorous research data collection procedures and generation of heterogeneous data types for analysis, along 
with the rise in meta-analysis research which necessitates consultation of different datasets and results, 
suggests that a high level of detail related to the data will be documented and represented within soil science 
publication content. 

As a starting point, research articles published between 2006-2011 were selected at random for this 
exploratory sample. Descriptive coding of the articles was manually performed, with the initial codelist 
derived from a functional vocabulary introduced by Cragin, Palmer, and Chao (2010) that maps 
relationships between data characteristics, research practices, and curation activities, and gradually refined 


with emerging themes based on subsequent rounds of coding. 
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Figure 1: Examples of indicators identified from sample article (Truu, Truu, & Ivask, 2008): includes how 
these data details are represented in-situ to describe the context of data collection and processing for 
analysis. 


3 Preliminary Findings 

Across all the articles from the different journals, several themes became visible regarding available 
information related to data and associated research practices. Figure 1 details an example of the description 
indicators identified from a journal article on soil microbiological and biochemical properties assessment. 
The article encompasses details of the study site where collection of soil samples occurred (data collection 
site; data type collected), the instruments and techniques applied in collecting and processing the soil 
samples including units of measurement (instrument; named/cited technique), and what soil microbiological 
variables comprise the dataset for statistical analysis (statistical analysis technique; data for analysis). 

In considering curation implications based on the available indicator details, the description of the 
study site provides rich contextual evidence regarding the data source which contributes to the provenance 
of the data. The applied techniques and instruments used are critical for future replication and may also 
provide insight to known standards within a given discipline for techniques that are used to generate 
particular data. These description indicators of data collection site, data type collected, instruments, named 
or cited techniques, data for analysis, and statistical analysis technique were consistently observed in the 
sample, although the degree of elaboration for each indicator varied between articles within the same 
journal. Some articles also detailed quality control practices, such as the removal of particulates that 
surpassed a certain threshold and homogenization of soil samples for analysis. Additional sources of 
description information about data were found in the succinct captions for tables and figures, often relaying 
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resulting relationships between different data variables; aligning what variables were assessed or measured 
with how they were generated or derived appears to be possible based on the details of a publication and 
may contribute to the value of these data. 


4 Future Work 


To maximize the potential for data sharing and reuse, the provision of rich data description is necessary. 
Preliminary results from this study propose journal publications are a productive resource for distinguishing 
contextual information about the collection of data and how these details are represented, such as cited 
references for techniques used or numerical values. Next steps include increasing the publication sample 
size within soil science for more detailed analysis of journal article content to solidify observed themes. 
Specific attention will be given to trends in cited references for techniques and developing a more systematic 
approach to determining the presence of a data description indicator. It will also be helpful to see how this 
approach extends to other disciplines within the small sciences. Additional exploration of available tools is 
needed to more fully understand how description information for data can be extracted from research 
articles. Establishing a concrete base for indicators can have potential implications for the development and 
advancement of automated processes to capture and enhance data description in supporting data 
repositories and curation services. 
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Abstract 

This poster presents a study that examines online profiles of 300 U.S. public libraries on Twitter to 
analyze how public libraries are constructing online identity and creating a visibility and voice in social 
space through photographs, images, biographical text on profiles, and public postings. The study also 
examines library activity levels, followers, Peerindex scores and other metrics in exploring the influence 
and impact of public library presence within social space, and the impression management and 
information sharing activities of influential public libraries on Twitter to better understand how libraries 


can integrate information services into new and emerging online social spaces. 
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1 Introduction 


With the rise of Web 2.0 and social media, perceptions of the role of public libraries are changing to include 
new ways of engaging with the public, from participatory “library as conversation” models (Lankes et al., 
2007) to the embedding of library services within the “lifeflows” where users are engaging with information 
(Brophy, 2008). In an ethnographic study of one library using social media, Carlsson (2012, p. 207) notes a 
staff member’s comment: “We’re stuck in the site structure of website for the whole city... makes it harder 
for people to find us. Facebook and Twitter are very useful for that.” The growing presence of public libraries 
within social spaces is underlined within two other recent studies (Del Bosque, Leif & Skarl, 2012; Stuart, 
2010) that point in particular to sharp increases during 2009 in the number of libraries using Twitter. 

This proposed poster for iConference 2014 will discuss results from research exploring how the 
“footprint” of today’s public library now increasingly includes a virtual branch presence within social spaces 
such as the Twitter microblogging environment. The study examines online profiles of 300 U.S. public 
libraries on Twitter to analyze how public libraries are constructing online identity and creating a visibility 
and voice in social space through profile photographs and images, biographical text on profiles, and public 
postings. The study also assesses library activity levels and influence metrics to explore the influence and 
impact of public libraries in social space, and seeks to identify information sharing and impression 
management behaviors of highly influential public libraries using Twitter. 

This study asks the following research questions: 


1. How are public libraries constructing online identity, visibility and voice on Twitter through 
information revealed and concealed in profile photographs, biographical text on profiles, and online 
postings? 

2. How are public libraries presenting their role and mission through their Twitter profiles? 

3. What are the information sharing and impression management behaviors observed in public 
libraries’ online interactions and information sharing activities within the microblogging 
environment of Twitter? 

4. What are the impression management and information sharing behaviors of influential public 
libraries on Twitter? 
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In this study, online profiles of 300 U.S. public libraries on Twitter downloaded on August 31, 2013 are 
studied using content analysis to identify visual self representations of the public library in social space 
through the profile images and backgrounds selected by the libraries. Libraries’ self-representations often 
focus on the physical “library as place,” with images of exteriors or interiors of the library building; “library 
as function,” with images of books or computers; and “library as conversation” with images of people 
conversing or interacting in classes and workshops. Visual representations of library Twitter profiles and 
the textual representations expressed in the biography field of the library’s profile are examined for 
impression management of information revealed and concealed in conveying a presence in social space. This 
study uses content analysis of the visual images selected as the library’s profile and background pictures 
and intercoder reliability testing in building a codebook of image descriptions for assessing public library 
visual self-representations. The study draws upon prior research into social profiles and impression 
management in Facebook and MySpace (Boyle & Johnson, 2010, Davies, 2012; Hum et al., 2011), but brings 
a new focus on Twitter and the social profile and online presence of public libraries. 

As libraries move toward providing information and services on Twitter and other social media 
sites, key unanswered questions remain in terms of how to measure “success” in social media, and what 
constitutes impact and influence in information sharing by public libraries in social spaces. For example, 
public libraries fulfill a complex mixture of roles including memory institution for preserving knowledge, 
community center for social engagement (Brophy, 2008, p.8), technology center delivering access to research 
resources, and community learning center or “people’s university.” Yet conveying these many aspects of the 
library’s role and “self” is constrained within the limits of social space in profile photos and backgrounds, 
and brief textual biographies on a Twitter profile. Text and visual representations will be explored for 
insights into key roles incorporated by public libraries into profile self representations, such as images 
emphasizing computers (“library as technology center”). 

This study examines facets of the activity and interactions of public libraries on Twitter in terms 
of total numbers of postings, original launch date, most recent posting, and other users followed by the 
library. For assessing the relative influence and impact of a library, this study uses metrics including total 
numbers of followers of the library’s Twitter page, and the library’s PeerIndex scores. A prior study (Mehta 
et al., 2012) has suggested that number of followers acquired may be a more reliable indicator of influence 
than retweets or inclusion on lists. PeerIndex scores, similar to Klout scores, are a popular social media 
influence metric comprised of scores for areas including authority, activity and audience (del Campo Avila 
et al., 2013) which are aggregated into a single influence score ranged from 1 to 100. 


2 Conclusion 


Results of this study will identify influential public libraries on Twitter; provide new insights into how 
libraries present themselves, engage with users, and share information with the public on Twitter; and in 
particular will identify the information sharing and impression management practices of influential public 
libraries on Twitter. It is anticipated that this study will also provide insights into the lifecycle of a library’s 
presence in social spaces from birth to death, and the lingering remnants of a dormant or defunct social 
presence. Results will have implications for how public libraries can successfully engage with members of 
the public in social spaces, and raise issues for further study of how public libraries can move followers 
beyond less-engaged actions of following in the social sphere toward more direct engagement and 


involvement in mission-critical actions in the larger world. 
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Abstract 

An increasing number of large digitized audio-visual collections within digital humanities have recently 
been made available for users. Often access to digitized audio-visual collections is hampered by little and 
inconsistent metadata. This paper presents the preliminary findings from a study of the search log in a 
radio broadcast archive. Firstly, results in relation to the identified types of search terms show that the 
Programme listing category was the most frequently identified category followed by categories of Person 
and Subject. Secondly, users rarely apply advanced search operators but instead apply phrase or single 
word queries. 
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1 Introduction 


An increasing number of large digitized audio-visual collections within digital humanities have recently been 
made available for users. Often access to digitized audio-visual collections is hampered by little and 
inconsistent metadata (e.g., Hollink et al., 2009). This results in a number of challenges for the use of 
digitized collections both in research and in daily life. The purpose of this paper is to present the preliminary 
findings from a study of the search log in a radio broadcast archive, the LARM.FM! -— archive and 
accordingly gain insight into end users search behaviour in the context of digitized broadcast audio. In 
contrast to related previous studies (e.g., Huurnink et al., 2010) the present study focuses solely on access 
to digitized audio collections. The following research questions guided the study: 


Q1: How are the users issuing queries to the LARM.FM - archive? 
e What type of search terms can be identified? 
e What types of search operators are used? 


The LARM.FM-archive was launched in November 2012 as part of a joint initiative between the Danish 
national broadcasting corporation (DR), the State and University Library hosting the Danish Media 
Archive, and a consortium of Danish university humanities departments. The LARM.FM-archive provides 
streaming access to more than 1 million hours of Danish national, regional and local radio broadcasts from 
1925 and onwards. In addition the archive is seen as part of a research infrastructure that enable researchers 
to search and annotate the many recordings of the radiophonic cultural heritage and to communicate about 


and interact with radio broadcasts. 


1 http://www.larm-archive.org 
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2 Related studies 


Despite the growing number of digitized collections within audio-visual archives (Wright, 2007), there are 
only few studies of how end users search and interact with these collections. In a recent study Huurnink et 
al. (2010) analysed media professionals’ use of an audio-visual archive through transaction log analysis. The 
results of the study show that (i) over half of the sessions were completed in less than 1 minute; (ii) nearly 
all of the queries contained a free text keyword search while almost a fourth specified a date filter. Program 
title was the most frequently occurring keyword search; and (iii) the advanced search options was used in 
only 9% of the queries. Huurnink et al.’s (2010) study took place within the context of a large broadcast 
archive containing both video and audio resources. They did not distinguish between the two media types, 
but the study revealed that less than 10% of the clicks and orders to the archive were for audio material. 
Accordingly, the results of the study must be skewed towards video as a media type. Therefore the results 
are not directly comparable to the results of the present study, but nevertheless considered relevant. In 
addition earlier studies have analysed written requests to audio-visual archives (Hertzum, 2003; Sandom & 
Enser, 2001) identifying users’ underlying information needs but not search behaviour. Finally, only a very 
limited number of user studies focus solely on audio archives. An exception is Kim et al. (2003) reporting 
on an exploratory study of the criteria searchers use when judging the relevance of recorded speech from 
radio programmes. 


3 Method and experimental design 


The LARM.FM-archive has 731 registered users as of September 2013 (Table 1). Due to copyright 
restrictions the LARM.FM-archive is not open to the public but only to researchers, university students 
and some professionals (librarians, archivists etc.). Unfortunately the user registration system has changed 
since the launce of the LARM.FM- archive and only details about occupation and institutional affiliations 
from 427 users are available. 


Occupation Number of users % of total 
Researchers 133 18 
Students 228 31 
Professionals 86 12 
Unknown status 284 39 

Total 731 100 


Table 1: The user groups in LARM.FM. The 284 users with unknown occupation is due to the change in 
user registration system. 


The high number of students is due to the participation of students in research projects using LARM.FM 
as well as the use of LARM.FM for teaching and therefore it is expected that the composition of the user 
groups will change according to on-going research activities or teaching activities. 

The LARM.FM user interface is developed in Microsoft™ Silver Light, which in combination with 
the use of Google Analytics as logging systems limits the possibility to log all user actions in the interface. 
The implementation of Google Analytics in LARM.FM allows us to identify a user session but it is not 
possible to identify the user behind a user session. Subsequently it is not possible to connect sessions to 
users or their characteristics, e.g. user groups. 

The radio programmes were imported into LARM.FM with sparsely populated metadata about 
each programme but the implemented metadata scheme allows users of LARM.FM to enrich the recordings 
(Lund, Bogers, Larsen & Lykke, 2013). As of September 2013 only 1086 of 578978 programmes have been 
enriched with user generated metadata. 
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Log analysis is an unobtrusive way to collect large amounts of data on user search behaviour 
(Jansen, 2008). In this study we have analysed the usage log for the period from 10" of August 2013 until 
9" of September 2013 both days included resulting in approximately 15740 entries. To identify user entries 
containing a query formulation the usage log where parsed through a sed-script resulting in 6837 entries 
that included a search string. Of these 2827 were identified as unique user sessions where the user had 
performed a search in LARM.FM. 

To identify the types of search terms used in the extracted query strings a coding scheme was 
developed based on a scheme used by Huurnink et al. (2010). The scheme divided queries into 8 categories, 
of which 5 categories (Location, Person, Name, Genre, Subject) were taken from Huurnink et al. (2010) and 
3 additional categories (Time/year, Title and Programme listing ID) were added to match the current 
study. Categorization of search terms was done independently by 2 of the authors of this paper. The results 
were then compared and decisions on category were decided accordingly. 


4 Results 


First we look at the type of search terms that can be identified. A total of 2827 search sessions were 
identified including 2470 unique queries and 357 queries where the same query formulation was repeated. 


Type # of queries % of queries 
Time/year 151 6 
Title 211 9 
Location 75 3 
Person 292 12 
Name 68 3 
Genre 68 3 
Subject 357 14 
Programme listings 645 26 
Unknown origin 689 28 
Total # of occurrences 2556 


Total # of unique queries 2470 


Table 2: Queries according to list of categories 


Table 2 shows the frequency of the 8 categories identified in the 2470 unique user queries. The Programme 
listing category was the most frequently identified category (26%) reflecting a verificative search on a 
specific radio programme listing ID number. It is followed by the Subject (14%), Person (12%), and Title 
(9%) categories. Overall, Location, Name and Genre were the least frequently identified categories. 

The LARM.FM front-end allows for entering a search string or a direct search on date (either single 
date or date range), content type (radio shows and program listings), user generated content (user enriched 
metadata) and broadcast channel. The search box for entering a search string does not provide any help 
feature about the search syntax used. The back-end of LARM.FM is built on Apache Solr? search platform 
and search phrases are delimited by using double quotes. The default operator in LARM.FM is OR. 

Table 3 shows the type of delimiters and search operators used. The results show that the large 
majority of queries are issued as either a phrase search or a single word string, whereas use of advanced 
search features such as Boolean operators or truncation is limited. The 177 extracted word strings are 
probably meant to be phrase searches and are possibly due to users who are not aware of the correct search 


syntax. 


> http://lucene.apache.org/solr/ 
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Result 
Phrase search using delimiter 1202 
Word (single word string) 1058 
Word (multiple words) 177 
Operator (NOT) 1 
Operator (AND) 53 
Truncation (*) 19 


Table 3: Delimiters and operators used 


5 Discussion and conclusion 


The findings correspond to previous results concerning end-user searching that end-user searchers still make 
simple, short queries with few free-text search terms and little use of advanced features. 

The present paper presents the preliminary results regarding how users issue queries to the 
LARM.FM-archive based on log analysis. The preliminary results indicate that parallels can be drawn to 
the log analysis study by Huurnink et al. (2010). Firstly, in relation to applied search terms, the categories 
of Person and Subject are more frequently applied than Location, Name, and Genre. Secondly, users rarely 
apply advanced search operators but instead apply phrase or single word queries. Infrequent use of advanced 
search operators is earlier identified in relation to audio-visual material (Huurnink et al., 2010) and in search 
behaviour in general (Markey, 2007). The analysis of user queries further indicated a surprisingly interest 
in music related radio broadcasts. Future log analysis on user behaviour in LARM.FM will study a larger 
time span (one year) to determine whether the music related interest can be explained by on-going research 
activities or whether it is a more general characteristic. Future work should also include how queries evolve 
during a search session, how date and programme filters are used to limit searches, and an analysis of how 
users tag and annotate radio programmes. 
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Abstract 

The poster explores documentary practices in web environments where credibility is constructed and 
agreed upon. Based on studies of open peer review processes in scholarly journals and of discussions of 
credibility in comments to a climate change blog, four dimensions of credibility assessment activities are 
identified: gatekeepers/open participation; formal credibility assessment /intrinsic plausibility; individual 
credibility assessment/collective credibility assessment; and experts/laymen. Within each dimension, 
various positions and tensions with regard to credibility are exemplified. It is concluded that whether or 
not participation in credibility assessments, or review, becomes a collective activity within a documentary 
practice depends on the interaction between the affordances of the inscription technologies, social 
affordances and institutional practices. 
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1 Introduction 


Web technology facilitates making documents public and allowing people to communicate quickly and 
many-to-many around the documents. As a consequence of the ease of publishing, credibility assessments 
on the web often take place after rather than before a document has been made public. In the poster, some 
consequences of such changing conditions for the documentary practices in which credibility is constructed 
and agreed upon are explored. Four dimensions of credibility assessment activities are identified based on 
previous literature and on analysis of examples drawn from review practices in primarily two genres: 
scholarly journals and blogs. The primary examples come from open peer review initiatives in scholarly 
journals and from a study of participants’ conversations in a blog. Within each dimension, various positions 
with regard to credibility are exemplified through these empirical studies. 


2 Theoretical perspective 


The perspective applied considers web credibility to be a product of historically situated interactions taking 
place within a set of activities performed by a particular group of people involving certain types of 
documents or other tools, i.e. as part of specific (documentary) practices (Frohmann, 2004). 


3 Data collection and methodological considerations 


The analysis comes out of the author’s current and previous research (e.g. Francke, 2008; Francke & Sundin, 
2010; Francke, 2012). The research involves qualitative, explorative studies of web documents and 
documentary activities, primarily with a focus on scholarly journals and on blogs. 

The study of open peer review initiatives in scholarly journals was initiated in Francke (2008) and 
later expanded through analysis of pertinent examples chosen from the past few years. The blog data were 
collected as part of a larger study of blogging activities in which nine bloggers writing about environmental 
issues and current affairs participated. These are areas where conflicting views may exist and, as a result, 
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the credibility of the blogger and of her sources is likely important. Those parts of the blog texts that in 
some way concerned credibility or the use of sources were collected and analyzed thematically. The analysis 
made here is based mainly on discussions that took place in comments to one of the blogs. The bloggers 
gave their informed consent, but consent was not gathered from the (sometimes anonymous) people 
commenting on posts. For this reason, no direct quotes have been used in the poster. 


4 Dimensions of credibility 


Below, four dimensions of credibility assessment activities in web environments that have emerged through 
analysis are described with illustrating examples. 


4.1 Gatekeepers < > Open participation 


Traditionally, people have relied strongly on gatekeepers to help decide not only which documents are 
relevant, but also which are credible. Gatekeepers historically include, for instance, editors, librarians, and 
reviewers. In a media environment where documents less frequently go through such gatekeepers before 
reaching the public, other trusted parties, such as bloggers, become gatekeepers. On a larger scale, Henry 
Jenkins (2006, pp. 17 f.) has pointed to this tension as characterizing much of the modern media 


environment: 


on the one hand, new media technologies have lowered production and distribution costs, expanded 
the range of available delivery channels, and enabled consumers to archive, annotate, and 
recirculate media content in powerful new ways. At the same time, there has been an alarming 
concentration of the ownership of mainstream commercial media [...]. 


In the area of scholarly journal publishing, established publishers try to come across as attractive by 
portraying themselves as gatekeepers, not least through the system of rigorous peer review. But there are 
also journals which have tried to address frequent critiques raised towards the peer review system by trying 
to design a more transparent system. For instance, BioMed Central’s BMC Medicine has implemented a 
system for open peer review, where the author knows the names of the reviewers and, if the article is 
accepted, all versions of the article are published along with the review comments and the authors’ responses 
to these (BioMed Central, 2013). Nature tested a system where anyone could provide a review to a selected 
number of contributed articles (Nature, 2006). Furthermore, the Journal of Interactive Media in Education 
offered a combination of these two systems for a few years before they changed to a more traditional system. 
A more radical example is the journal Philica which accepts contributions in all different disciplines. The 
journal publishes the manuscripts as soon as they are submitted and puts them up for review by anyone 
who feels so inclined. The reviews are visible to everyone. However, the number of reviews is so far limited. 
Only one of the three established journals, BMC Medicine, which is also the one where the system is least 
open for anyone’s participation, continues to implement the open peer review system. Based on these 
examples, one can argue that open participation in the assessment of quality has not become an integrated 
part of the documentary practices of the scholarly community. 


4.2 Formal credibility assessment € > Intrinsic plausibility 


In Patrick Wilson’s analysis in Second-hand Knowledge (1983) of how we determine who or what is a 
cognitive authority to us, he suggests that when it comes to evaluating documents as potential cognitive 
authorities, we take our departure in the documentary practices of a document’s genesis and use. This 
includes how well regarded the author of the document is, the various activities through which the document 
is produced, distributed and evaluated, and how well the values and beliefs expressed agree with our own 
— the document’s intrinsic plausibility. 

Even in social media, credibility is often associated with formally published sources. An example, 
mirroring web credibility guidelines addressed to students, is the concern that sources should have been 


1052 


iConference 2014 Helena Francke 


formally assessed, which is expressed in Wikipedia’s policies on Verifiability and No original research 
(Wikipedia, 2013a; 2013b; see also Sundin, 2011). In comments to posts in one of the climate change blogs 
analyzed, factors having to do with a document’s author and production history (Wilson, 1983) were also 
prominent. A number of discussions in the comments focused on the credibility of peer reviewed scholarly 
articles and of newspaper articles. Individual authors and groups of authors, scientific and journalistic 
conduct, publishers, and quality control systems were drawn upon in the comments as supporting or limiting 
the trustworthiness of the documents discussed. 

Furthermore, the blog participants relied strongly on what they found intrinsically plausible, in 
particular whether or not the views — epistemological or political — on climate change represented in the 
document were shared by the reader. Kaye and Johnson (2011) have shown that political values are an 
important factor in how blog readers attribute credibility to various types of blogs. The important role 
played by intrinsic plausibility when these blog readers assessed the credibility of articles concerned with a 
highly contested political topic supports those findings. What were constructed as general understandings 
within the practice of the blog strongly shaped and co-constructed which sources were viewed as credible 
and which arguments were considered to be valid. 


4.3 Individual credibility assessment < > Collective credibility assessment 


The blog participants often collaborated in the commenting field to determine what made a source more or 
less credible; credibility was not — or not solely — something which was considered predetermined by previous 
reputation (Metzger & Flanagin, 2008). The negotiations between blog participants served to affirm or 
perpetuate already held beliefs within the community and to convince somebody with opposing views. 

It could thus be argued that what we see in the example of the blog discussion, just as in the talk 
pages of Wikipedia, approaches collective credibility assessment. However, unlike the collective assessment 
in tabulated credibility (Metzger & Flanagin, 2008), where peer ratings provide a metric of credibility, this 
is a case of qualitative, discursive credibility assessment. Through the interaction taking place, the 
assessment also differs from the separate peer reviews published in BMC Medicine, or prepared for 
traditional scholarly journals, which make up a collection of individual assessments rather collective 


assessments. 


4.4 Experts € > Laymen 


Another aspect of the discussion in the blog comments is that the participants relate to and partly question 
the idea of ‘experts’ and non-experts or laymen. It is important to point out that the blog comments 
analyzed here took place on a site which gathered a mix of expertise: participants included both those who 
could be considered experts, with academic and/or other professional merits in relevant areas, and people 
whose knowledge in the area was less institutionalized, and also those who were merely ‘curious’. 
Occasionally, the difference could be difficult to determine and participants experienced a need to clarify, 
as when somebody used an academic title and was challenged to state whether the title was in a relevant 
academic discipline. In this case, the problem was primarily a matter of assessing the credibility of particular 
participants in the discussion, so that their contribution to the collaborative credibility assessment of a 
document could be evaluated. 


5 Concluding discussion 


The four dimensions presented illustrate the complexities involved in assessing credibility on the web but 
also some cultural tools that are being applied. Furthermore, the examples illustrate some of the power 
relations at play in coming to grips with credibility. For instance, the “material texture” (Foucault, 2002, 
p. 115) of the blogging software used in the main example allows for comments and questions to be posed 
and read by any reader, and the discussion moves to a public arena where blog participants with varying 
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subject knowledge both draw on and question the practices of scientists and journalists — traditional 
gatekeepers or experts. Thus, an activity such as peer review, which could be argued to have been a site for 
mainly ‘intra-practice’ genre discussions, is increasingly changed from the outside (Bazerman, 1988, p. 308) 
by the technological affordances and associated genre practices of the blog, and by the fact that scholarly 
journals and newspapers are often available online and can be hyperlinked to. 

However, as the examples above show, technical affordances are not enough to introduce change; if 
the values, beliefs and motivations of the discourse community do not support change, it will have difficulties 
gaining ground (Kling & McKim, 2000). Whether or not participation in credibility assessments, or review, 
becomes a collective activity within a documentary practice depends on the interaction between the 
affordances of the inscription technologies, social affordances and institutional practices. 
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Abstract 

Digitization has become a widely adopted technique for preserving content stored on decaying physical 
carriers. Understanding the epistemic conditions within which digitization standards develop is an unex- 
plored area of research. This project takes a case-study approach to look at the JPEG2000 standard and 
the discussion concerning its ability to support a suitable preservation format for analog video content. 
Following Keller’s (2005) sociology of knowledge approach to discourse (SKAD) methodology, this project 
analyzed 433 messages gathered from the time period 2000 to 2013 on the public discussion board of the 
Association of Moving Image Archivists (AMIA). This research suggests that for this knowledge commu- 
nity, standards produce contested technological configurations. This research identifies two dominant 
competing interpretive frames: A frame of technological innovation, and a frame of institutional integra- 
tion. The author suggests that considering epistemic registers provides a useful approach for understand- 
ing the process of adopting standards for preservation. 
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1 Introduction 


Standards increasingly impact work done in preservation institutions. Donaldson and Yakel (2013), in their 
work on the adoption of the metadata standard PREMIS, suggest “standards in archives are ubiquitous. 
They reflect the most current knowledge about professional practices and increase interoperability, con- 
sistency, and the safety and security of collections” (pp. 55-6). Digitization standards are particularly im- 
portant, as more and more institutions are adopting digitization as a strategy for access and long-term 
preservation. Paul Conway (2010) suggests that “in the age of Google, nondigital content does not exist, 
and digital content with no impact is unlikely to survive” (p. 64). Digitization is providing access to the 
archives of the future, and standards are effectively shaping how collections will appear to future genera- 
tions. The following research considers one particular standard for preserving collections, JPEG2000, and 
the debate surrounding its adoption within the moving image preservation community. This research is 
intended to draw attention to the role standards play in the social distribution of knowledge (Berger and 
Luckmann, 1966), in particular, how standards shape the storage and presentation practices of preserva- 
tionists in collecting institutions. 


2 Background 


2.1 Sociology of Standards 


While standards are typically overlooked in our society, they impact us on a daily basis (Busch, pp. 1-2). 
A variety of scholars have argued for a close sociological reading of standards as key components of insti- 
tutional infrastructure and important players in the constitution of knowledge production (Bowker and 
Star, 1999; Brunsson and Jacobsson, 2000; Busch, 2011; Lampland and Star, 2008; Latour, 1987). Timmer- 
mans and Epstein (2010) point out that the process of standardization itself can be seen as a form of 
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knowledge production, suggesting “standardization also raises questions about the role of science and ex- 
pertise in regulation: What evidence is sufficient or necessary to implement standards?”(p. 70). These ap- 
proaches demonstrate the close relationship between standards and knowledge, suggesting that the sociology 
of knowledge may be a fruitful theoretical approach for research into the sociology of standards. 


2.2 What is a Standard? 


Busch (2011) suggests that “standards are where language and the world meet,” implying that the phenom- 
ena that we call standards straddle the area between what can be linguistically specified and categorized 
and the properties of the physical world (p. 3). Standards also invoke categories and make distinctions 
about things and actions in the world, shaping infrastructure and institutions (Bowker and Star, 1999). In 
the case of formal technical standards, such as JPEG2000, they are represented in written documents 
developed through standardizing organizations (e.g. the International Organization for Standardization). 
These documents may be interpreted by engineers to create specifications for producing objects for com- 
mercial distribution. Standards considered at the level of a community define acceptable objects and prac- 
tices for the carrying out of community practices. Consequently, standards may be considered at the level 
of documents, technologies, or practices. This research considers the interplay between standards as tech- 
nology and standards as practices, focusing on the epistemic means by which the JPEG2000 standard is 
discussed and evaluated by the moving image preservation community. 


3 Problem Statement / Research Questions 


Antoinette Burton (2005) points out that archival institutions “are not just sources or repositories as such, 
but constitute full-fledged historical actors as well” (p. 7). Applying this concept to current preservation 
practices, it becomes clear that the technical process of digitizing documents is not a value-neutral act, but 
an active shaping of the historical record. This begs the question of how digitization standards become 
adopted, and in particular, how standards are produced within specific epistemic regimes. To begin an 
inquiry into these critical issues, this project proposes the following research questions: 


e How is JPEG2000 constructed as a preservation format within the discourse of the moving image 
preservation community? 

e What interpretive frames are invoked and how are knowledge claims reinforced or contested? What 
epistemic registers are invoked in the consideration of this standard? 

e Do the concerns over what counts as sufficient knowledge for evaluating preservation formats change 
over time within this community? 


4 Methods and Methodology 


4.1 Sociology of Knowledge Approach to Discourse 


This research uses the sociology of knowledge approach to discourse analysis (SKAD) developed by Keller 
(2005), with grounded theory methods (Charmaz, 2006) to look at a corpus of messages posted on the 
Association of Moving Image Archivists listserv between the years 2000 and 2013, which discuss the poten- 
tial role of the JPEG2000 format for preservation. The purpose of SKAD is to “analyze ongoing and heter- 
ogeneous processes of the social construction - production, circulation, transformation - of knowledge” (Kel- 
ler, 2005, p. 7). SKAD synthesizes the work of Michel Foucault (1970) on discourse analysis with Berger 
and Luckmann’s (1966) sociology of knowledge. Conceptualizing AMIA-L as a site of knowledge production, 
this research looks closely at how the epistemic grounds for claims about JPEG2000 are constructed through 
discourse. 
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4.2 Selection of Data Source 

The Association of Moving Image Archivist’s (AMIA) listserv, AMIA-L, was selected as a source of data 
because of its importance to this community, as it has historically been and continues to be a closely 
watched listserv by individuals involved in the preservation of moving image media, and it attracts discus- 
sants from a wide range of varying backgrounds, including archivists, engineers, corporate vendors, librari- 
ans, students, etc. 502 messages between the years 2000 to 2013 were retrieved from the publicly-available 
archives of AMIA-L using the search terms “JPEG2000 or JPEG 2000. 67 of the messages retrieved were 
duplicates or were on topics not directly related to JPEG2000. These messages were excluded from the data 
corpus, leaving 435 (86% of the total messages collected) for analysis. See Figure 1 for the distribution of 
messages over the period of this study. 
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Figure 1: Messages Analyzed on AMIA-L 


4.3 Data Analysis 


The key elements to analyze in research following SKAD include looking at interpretive frames, classifica- 
tions, phenomenal structure and narrative structure within the discourse surrounding this standard (Keller, 
2005). Grounded theory tools, such as “coding, commentaries and memos” (Keller, 2013, p. 109) are quite 
helpful in conducting discourse analysis. This project utilizes these tools, coding individual units of enunci- 
ation at the word, sentence, and message level, followed by writing memos, recoding documents, and aggre- 
gating codes into larger categories for reporting. 

After duplicates and off-topic messages were excluded form the analysis, the first pass through the 
data corpus sought to identify the classification criteria utilized by discussants to make arguments for or 
against JPEG2000 as a preservation format. In addition, techniques of argument and the presentation of 
evidence were identified. Subsequent data analysis sessions began to group framing techniques together with 
epistemic techniques, to form the key interpretive frames identified by this research. In the process of 
coding, not every message received the same amount or type of analytic focus. Initial analysis focused 
primarily on clearly positive or clearly negative messages directed at JPEG2000, but it became clear that 
seemingly neutral messages, which did not take a decisive stand for or against JPEG2000 contributed to 
understanding the context within which agonistic statements were enunciated. Questions posed to the 
listserv, required considerable judgment on the part of the researcher to interpret because they could be 
interpreted as both queries for more information on a certain topic, as well as questions designed to call 
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into question certain aspects of JPEG2000. Considering the temporal dimension (i.e. coding messages in the 
context of past and future messages) was essential to making sense of how questions posed to the listserv 
functioned to promote particular interpretive frames within the context of the ongoing discussion. 


5 Findings and Analysis 


5.1 Structure of Data Corpus 


Figure 1, above, shows the number of messages per year, suggesting a sudden ramp up of discussion from 
2004 to 2005, with peak levels of JPEG2000 discussion in 2009 and 2013. Table 1, below, aggregates the 
institutional affiliation of the discussants for each year. The particular identities of discussants are not 
directly relevant to the SKAD approach since the concern of this analysis lies in looking at the techniques 
of knowledge construction and the rules by which knowledge can be formed. However, Table 1 is helpful 
for providing context and for understanding which segments of the moving image preservation community 
were actively participating in the discussion. It is significant to the analysis that a small group of discussants 
appears to be responsible for guiding the discussion. Only 41 discussants engage in the discussion on 
JPEG2000 over the nearly 14 years of discussion, suggesting that the discourse around JPEG2000 is guided 
by only a few key figures. The AMIA-L “audience,” however, is much larger, constituted by the over 800 
individual AMIA members, plus members of the general public who subscribe to the listserv. The presence 
of a small group of discussants suggests a centralization of power in terms of who can construct the param- 
eters of discourse, which bears further analysis. 


2000 | 2001 | 200 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | Total 
2 

Ven- 0 1 0 0 0 21 15 19 11 69 42 16 18 40 252 
dors! (50% (49% | (68% | (56% | (65% | (66% | (61% | (57% | (46% | (59% | (58% 

) ) ) ) ) ) ) ) ) ) ) 
Na- 2 1 0 0 1 4 3 1 1 7 9 5 6 8 48 
tional | (100% | (50% (50% | (9%) | (14% | (3%) | (6%) | (7%) | (13% | as% | as% | (12% | (11% 
Ar- ) ) ) ) ) ) ) ) ) 
chives? 
Me- 0 0 0 0 0 2 1 1 0 0 1 1 2 3 11 
dium? (5%) | (5%) | (3%) (1%) | (4%) | (5%) | (4%) | B%) 
Ar- 
chives 
Small! | 0 0 0 0 0 3 0 1 4 20 3 2 1 5 39 
Ar- (7%) (3%) | (24% | (19% | (4%) | (7%) | (8%) | (7%) | (9%) 
chives ) ) 
Aca- 0 0 0 1 0 3 1 2 1 3 0 0 0 1 12 
demic (50% (7%) | (5%) | (6%) | (6%) | (3%) a%) | (3%) 
Librar- ) 
ies 
Re- 0 0 0 1 0 2 0 0 0 3 7 0 0 2 15 
search- (50% (5%) (3%) | (10% (3%) | (3%) 
ers ) ) 
Video 0 0 0 0 0 3 1 1 0 1 0 1 1 0 8 
Produc- (7%) | (5%) | (3%) (1%) (4%) | (3%) (2%) 
tion’ 
Stu- 0 0 0 0 0 0 0 1 0 0 0 2 2 0 5 
dents (3%) (7%) | (5%) (1%) 


1 “Vendors” are commercial entities that provide preservation technologies and services to archives, libraries and museums. 

? “National Archives” includes U.S. and international archives and libraries with large collections and financial support from national 
governments. 

3 “Medium archives” cover regional, state and medium-sized specialty collections. 

4 “Small archives” are typically local archives with limited funding. 

5 “Video production” includes video production companies as well as corporate and non-profit broadcasters. 
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Gov't | 0 0 0 0 0 2 0 2 0 0 0 0 0 0 4 

Agenc (5%) (6%) (1%) 

y 

Univer- | 0 0 0 0 0 0 0 0 0 0 2 0 4 3 9 

sity (3%) (10% | (4%) | (2%) 
_ Admin ) 

Other/ | 0 0 0 0 1 3 1 6 0 2 5 1 5 8 30 

Un- (50% | (7%) | (5%) | (18% (2%) | (7%) | (4%) | (13% | (9%) | (7%) 

known ) ) ) 

Total 2 2 0 2 2 41 22 34 17 105 |69 28 39 70 433 
Table 1: Institutional Affiliation of Discussants on AMIA-L, by Year (% of yearly discussion in brackets) 


Looking at Table 1, it is clear that the discussion is dominated by vendors who provide preservation equip- 
ment or services (producing 58% of total messages). Their ability to form knowledge claims depends on 
their presentation of technical knowledge and experience in serving the community’s preservation needs on 
an ongoing basis. These discussants have a financial incentive to shape the discussion to their advantage. 
However, if they are to be successful in encouraging others to adopt JPEG2000 or one of several competing 
technologies for digitizing video, they must make their arguments within the discursive constraints of the 
greater moving image preservation community. After the participation of vendors, individuals associated 
with national archives (11%) and small archives (9%) contributed the most, on an individual basis, to the 
discussion. If we aggregate the discussants associated with academic institutions (combining academic li- 
braries, researchers, students and university administrators) we can see that “academia” accounted for 9% 
of the total messages. Together these three groups, national archives, small archives and academia accounted 
for only 29% of the discussion. Vendors clearly have considerable influence over how the discourse is con- 
structed in this forum, yet in the following analysis, it will be clear that vendor-domination is only one 
narrative that can be identified from the discursive construction of JPEG2000. 


5.2 Battle of Interpretive Frames 


The analysis suggests that technological innovation and institutional integration are competing interpretive 
frames vying for dominance during the period analyzed. The technological innovation frame is characterized 
by the idea that its status as an internationally recognized standard and its advanced features make 
JPEG2000 the most appropriate solution for preserving analog video content. This frame is also supported 
by concern for the integrity of moving images as evidence. JPEG2000, framed as a standards-based “math- 
ematically lossless” technology for compressing visual information, constructs digitization as a process of 
exact copying, where fidelity to an original is of paramount concern. Using lossy compression formats, which 
lose visual information through the encoding process, represents a corruption of the information contained 
within the analog video signal. JPEG2000 is reported to be a reversible compression process, and the par- 
ticular implementation under consideration supposedly captures non-visible information such as line 21 
information for closed captioning and vertical interval time code information used in video editing. 

This frame also positions the innovative technology as both fitting a set of pre-determined, presum- 
ably universal criteria, as well as acting as the next rational step in a narrative of technological progress. 


“JPEG2000 is a standard that has many advantages over the others you mentioned - one of the 
very important aspects is the ability to generate low and high resolution proxies from the same file 
without having to store additional copies. ... I strongly recommend a hard look at JPEG2000. 
JPEG2000 forms the basis of the file format for Digital Cinema and has a strong following in many 
fields” (AMIA-L, 5/8/05). 


In the above example, we can see how JPEG2000 is positioned as an authoritative standard, that it possesses 
new features and has been adopted by other significant communities concerned with digital imaging. 
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The competing interpretive frame, institutional integration, suggests that solutions should be con- 
sidered in light of the institution’s specific context, including its infrastructure, users, funding sources and 
size. For instance, a very large institution such as the Library of Congress can afford to buy the complex 
technology needed to support preservation formats built on the JPEG2000 standard, while smaller institu- 
tions may not be able to follow due to social, economic or infrastructural limitations. This frame effectively 
rejects JPEG2000 as a viable standard for widespread community adoption, instead proposing a variety of 
alternative approaches based on specific institutional configurations. 


“T don't think the answer is as simple as saying 'choose one codec over another'. There are too 
many considerations to take into account, and too many unknowns. ... it is mentioned that this will 
be for preservation, but what is the future use of the files? How much storage does the archive have 
available? How much future storage can they afford? ... You first must start by understanding your 
users, and having a firm grasp on your goals. You can't just say that one format is the best for all 
applications” (AMIA-L, 9/13/05). 


In response, the supporters of the institutional integration frame began marshalling an assault on the various 
technological claims and the purported innovative qualities espoused by the JPEG2000 creators. 


“THERE IS NO INDUSTRY CONSENSUS on the best way to archive video. The path an organ- 
ization takes is determined by a large number of factors. Among the most important are resources, 
personnel and the goals/mandates of the archive. It is great that the cost per gig of storing video 
on drives and data tape are coming down. For preservation purposes, that cost may not be the 
most important to the archive” (AMIA-L, 2/10/09, emphasis in original). 


The use of capital letters in the beginning of this quote works to emphatically question the status of 
JPEG2000 as a standard, based on the presentation of evidence indicating that it has not been widely 
adopted. The above example illustrates how JPEG2000 is positioned by detractors as an inflexible technol- 
ogy that does not easily integrate with the institutional context or fulfill the needs of expected users of 
digitized content. Supporters of JPEG2000 respond to these types of attacks on JPEG2000 by minimizing 
the ongoing problems identified by opponents as evidence against JPEG2000, such as issues with interop- 
erability (i.e. being able to properly playback digitally encoded content on different systems): 


“T do not know of what ‘inop problems’ you are talking about with JPEG2000 at this point in time. 
4 or 5 years ago there were just a few... ummm... uhhh... proponents [...] but at this point in time 
there are multiple levels of support from multiple vendors, and many projects are now specifying 
j2k as the preservation format of choice for SD video” (AMIA-L, 4/5/11). 


In addition, JPEG2000 supporters draw on the assumptions of preservation ethics to enhance their claims 
that JPEG2000 is a technology that supports the existing standards of the archival community. Examining 
other technologies by looking at the quality of images they produce is seen as an unacceptable manner of 
assessing quality in digitization: “I have a very hard time supporting the ‘I can't see a difference so it's fine’ 
philosophy of AV Archiving” (AMIA-L, 2/27/10). Such messages imply that employing formats other than 
lossless JPEG2000 compression is an indication of a flawed approach to archival preservation. By reposi- 
tioning the innovation as a marker of adherence to preservation ethics, the technological innovation frame 
constructs the use of JPEG2000-based formats as a way of differentiating the true preservationists from 
those doing what is merely expedient. 

In the last two years of the period of study (2012-2013), detractors posited questions concerning 
key technological claims of JPEG2000 supporters, which worked to degrade the epistemic support of the 


innovation frame itself. 


1061 


iConference 2014 Zack Lischer-Katz 


“Ts someone able to point me towards any reports or documentation that detail the evidence for 
Jpeg2000 offering true lossless compression? I have realized that while I have many references 
around this, none of them detail actual testing that has proven this” (AMIA-L, 4/10/13). 


By calling into question the basic technical attributes of JPEG2000, detractors seek to destabilize the 
existing knowledge on JPEG2000, specifically its technical abilities to produce losslessly compressed files. 
By casting doubt on this essential attribute, this seemingly innocuous question problematized the purported 


innovation’s technological attributes. 


5.3 Epistemic Techniques 


Analysis of the data suggests the following forms of knowledge techniques are mobilized to support or refute 
JPEG2000 as a standard: 


e Statement of fact: Statement to be taken as true evidence, often supported by expert positioning 
(see below). 

e Anecdotal evidence: Personal experience offered as evidence, typically bolstered by positioning as 
expert or long-time member of the community. 

e Rhetorical flourishes: Humor, irony, tropes, emoticons, etc. used as convincing gesture, encouraging 
trust, or lightening tension in heated debate. 

e Instrument-based knowing: Use of machines, scopes, algorithms, etc. to verify truth claims. The 
dominant innovation frame elevates machine vision and calculations, while denigrating educated 
human perception as reliable verifier of truth claims and quality. 

e Documentary evidence: Discussants typically provide links to external documents or webpages; 
supply copies of code or computer output as evidence; reference documents; or provide footnotes to 
support arguments. 

e Calculations: Discussants will often write out the steps of a basic arithmetic calculation as a 
performative display of evidence to support a particular argument. 


In addition, two additional techniques, expert positioning and querying were identified that deserve special 
explanation. Expert positioning is an emergent theme that involves a particular discussant positioning him 
or herself as an expert by using anecdotal evidence such as statements about length of involvement in the 
field, association with notable organizations, presentations made at conference, etc. to give credibility to 
statements and make the particular individual appear as an authority on the topic under consideration. 
The common use of email signatures at the end of posted messages, often providing information about 
institutional affiliation, job titles or other credentials, plays a significant, although often unnoticed role in 
defining the authority of a discussant’s statements. Of the messages analyzed, 85% (365) had signatures 
displaying institutional affiliation. Queries on the other hand are questions posed to the list that decrease 
the credibility of other discussants and their claims about JPEG2000: “Can you point to reports showing 
results of interoperability tests of MXF-wrapped JPEG2000 files created on one application and played 
back/rendered on another vendor's product?” (AMIA-L, 4/10/13). This query on the AMIA listserv, seem- 
ingly neutral, effectively calls into question the claims by JPEG supporters that the ongoing interoperability 
issues associated with JPEG2000 had been solved. Shifting the focus of the debate to the unresolved in- 
teroperability problems associated with JPEG2000 destabilizes the claims of the JPEG supporters that 
JPEG conforms to preservation ethics. Interoperability is seen as a critical attribute of a preservation format 
since it ensures that files can be exchanged with other institutions and that files can be accessed on other 


software and hardware configurations into the future. 
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5.4 Shifts in Epistemic Grounds 


The preceding list suggests the range of epistemic techniques employed in the debate over this standard 
during this time period. The most common form of argumentation relies on documentary evidence to sup- 
port its claims. Statements of fact and anecdotal evidence are common as well, relying on the positioning 
of the speaker as an expert and/or long-time member of the community. 

A key finding of this research is that, over time, the detractors of JPEG2000 began to move the 
focus of their debate against JPEG2000 from the particular claims made by supporters to the grounds on 
which these claims were being made. The detractors of JPEG2000 appear to continually shift the burden 
of proof back from rhetoric to direct experience with the technology, constituting an emergent theme termed 
individual technological engagement, which privileges individual experience with technology over the 
knowledge of experts. While an economic approach might suggest that the lack of widespread adoption of 
JPEG2000-based technology is due to economies of scale and the limited revenue streams of small organi- 
zations, using the SKAD approach, we can see that another important factor may be conflicting expectations 
within the community about the epistemic grounds upon which a technology may be understood. These 
shifting epistemic grounds may be linked back to historical changes within the preservation field through 
additional analysis, as discussed below in 6.1 Future Directions. Table 2, below, identifies key moments in 
the development of JPEG2000 that could be further explored through interviews with the key individuals 


involved. 


2000 JPEG 2000, Part 1 becomes an International Standard (ISO/IEC 15444-1) 
http://www.iso.org/iso/catalogue_ detail? csnumber=27687 

2001 JPEG 2000, Part 3 (for moving images) becomes an International Standard (ISO/IEC 
15444-3) http://www.jpeg.org/jpeg2000/j2kpart3.html 

2002 

2003 

2004 Digital Video Preservation Reformatting Project (a report by Media Matters for the Dance 
Heritage Coalition analyzing options for preserving analog video content; used in AMIA-L 
discussions as a key document to support the argument for using JPEG2000). 
http://www.danceheritage.org/digitalvideopreservation.pdf 

2005 The Library of Congress adopts the use of JPEG2000 file format for preserving images. 
http://memory.loc.gov/ammem/help/mrsid.html 

2006 

2007 Federal Agencies Digitization Guidelines Initiative (FADGI) is launched under the auspices 
of the National Digital Information Infrastructure and Preservation Program (NDIIPP) at 
the Library of Congress. http://www.digitizationguidelines.gov/ 

2008 

2009 Library of Congress adopts JPEG2000 for digitizing video collections. Library of Congress 
— NAVCC Begins transferring analog video tapes to MXF wrapped JPEG2000 files. 
http://www.loc.gov/avconservation/preservation/projects. html 
JPEG2000 Alliance Formed, http://www.jpeg2000alliance.com/?page_id=2 

2010 

2011 JPEG 2000 Summit (Library of Congress, May 12-13), http://www.digitizationguide- 
lines.gov/resources/jpeg2000. html 

2012 

2013 


Table 2: Events in JPEG2000 History 
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6 Limitations 


As a case study, the findings of this research are limited to the specific standard and community analyzed. 
In addition, analysis was limited to one forum of knowledge exchange, the popular AMIA-L listserv. Clearly, 
the public debate surrounding JPEG2000 took place in other forums, such as the ongoing work of Federal 
Agencies Digitization Guidelines Initiative, academic and practitioner-authored journal publications, and a 
variety of conferences, including the annual AMIA conference. Additional research to triangulate data across 
these forums is necessary. Some of the key events in the history of JPEG2000, in need of further study, 
appear in Table 2, above. 


6.1 Future Directions 


The SKAD approach may be useful for other examinations of technology diffusion. Diffusion of innovations 
(Rogers, 2003) approaches could be enhanced by affording weight to the role that knowledge techniques 
play in determining how technologies are adopted across a community. In addition, Backhouse, et al. (2006) 
suggest a fruitful direction for considering standards within a circuits of power framework, which might be 
enhanced by using SKAD to examine how the social distribution of knowledge contributes to the configu- 
ration of circuits of power. Finally, further analysis of documents and events surrounding the adoption of 
preservation standards in the moving image preservation community can help explore the genealogical 
dimension of Foucault’s approach to discourse analysis. For instance, we might consider the role played by 
the consolidation of institutional power when the Library of Congress became an early adopter of new 
preservation technologies (such as JPEG2000), encouraging small organizations to follow suit, or in terms 
of the changing professionalization of the field as graduate-level programs at universities begin to replace 
long-term apprenticeships as the dominant forms of training. These are just a few of the possible historical 
events that may be linked back to the discourses associated with the adoption of preservation standards. 


7 ~ Conclusion 


The case of JPEG2000 as a standard considered for preserving analog video content provides an illuminating 
case study for the playing out of discourses in the construction of a preservation standard. JPEG2000 was 
adopted by the Library of Congress as a standard for both preserving still images (in 2005) and video (in 
2007). The issue of whether JPEG2000 should be adopted by the greater preservation community is still 
hotly debated. 

This research suggests the usefulness of taking a SKAD approach for extending our conceptualiza- 
tion of how the members of a heterogeneous knowledge community debate the adoption of a standard, 
including better understanding the possible range of epistemic techniques they employ, and the potential 
shifts in interpretive frames and epistemological grounds that may be observed over time. This work seeks 
to contribute to a growing body of literature on the sociology of standards, drawing attention to the complex 
role standards play within communities tasked with preserving cultural heritage collections, and the role 
that standards play in the social distribution of knowledge (Berger and Luckmann, 1966). In addition, it 
offers new insight into a community of preservationists engaged in generating knowledge about a new, and 
potentially disruptive, preservation technology through ongoing debate, offering insight into how archivists 
work to shape the traces of the historical record in the age of digitization for preservation (Conway, 2010). 


ê University of California - Los Angeles and New York University both started their own graduate-level moving image preservation 
programs in the early part of the time period under analysis, in 2002 and 2003, respectively. 
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Abstract 

This paper proposes to enhance geographic information retrieval interfaces with geovisualization 
techniques such as multiple representations and rich interactions. It supports its proposition by examples 
and illustrations of GeoPubMed prototype which was designed to enhance geospatial access to medical 
citations in PubMed database, the premier database of health publications. Such an enhancement results 


in expansion of user tasks which become more analytical than simple reference question-answering tasks. 
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1 Introduction 


Geographic information retrieval interfaces are becoming common enhancements to search engines, social 
networking systems, bibliographic databases, digital libraries, image retrieval systems, and many other 
information systems. The majority of such interfaces, however, answer simple questions such as Where?, 
Who?, What?, and When? (Jones, 2007). Social networking systems go beyond these questions and add 
information about places, and opinions about them by means of linking user-contributed reviews to maps 
(Bilandzic et al., 2008). Questions Where?, Who?, When?, and What? along with reviews could be 
characterized as reference information. While answering reference questions is important for exploration of 
geographic localities, it is not the only type of information that geographic retrieval systems can and should 
inform users about. In contrast, interactive geovisualizations support a much wider range of exploratory 
tasks. These tasks could be beneficial to information seekers as well. Specifically, geovisualizations are 
concerned with analytical questions and tasks. For example, they focus on facilitating comparisons, 
associations, identification of similarities and differences, and other tasks (Andrienko & Andrienko, 2006). 
Such analytical tasks help researchers think, notice patterns and outliers, synthesize information, derive 
insights from massive data, and complete many other higher order cognitive activities. 

These tasks could be beneficial to information seekers as well, especially when we take into account 
that many users of retrieval systems do not only come to find answers to reference questions but also to 
analyze published information. There is substantial evidence in research literature that researchers engage 
in analytical tasks as they analyze spatial diffusion of publications, spatio-temporal patterns, social 
networks, and research other aspects of documents (see, for example, recent studies by Merel et al., 2013; 
Pan et al., 2011; and Groneberg-Kloft et al., 2009). A search for “citation analysis” in PubMed retrieves 
around 3000 publications. 

This paper outlines how blending analytical tasks with geographic retrieval tasks could facilitate 
detection of the expected and discovery of the unexpected phenomena in document collections. We do this 
on the example of GeoPubMed prototype that provides access to PudMed citations, one the most 
comprehensive collections of medical literature. 


2 Background 


Geographic information retrieval is an interdisciplinary concept that combines principles of information 


retrieval and digital maps. Early research efforts in geographic retrieval were focused on conceptualization 
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of retrieval algorithms, geospatial relationships, automated indexing, development of gazetteers, modeling 
of metadata and ontologies, and conceptualization of basic interactions such as navigating, browsing, and 
searching (Hill, 2006; Larson, 1996). 

Meanwhile, geovisualization researchers have been preoccupied with a different research agenda. 
They were concerned with exploratory tasks (Andrienko & Andrienko, 2006), knowledge discovery, visual 
analysis (MacEachren et al., 2010), and other higher level cognitive activities. Such tasks and activities 
required not only highly interactive maps, but also a range of additional representations that can inform 
users about changes in objects’ attributes in space and time. Geovisualization tools help users make 
discoveries, generate hypotheses, reason, and understand. This paper demonstrates how geovisualization 
principles developed by geovisualization researchers can complement and improve representations and 


interactions with geographic information retrieval results. 


3 GeoPubMed Prototype 


GeoPubMed prototype is a search interface for PubMed that groups search results by locations. It retrieves 
names of countries, states/provinces from authors’ affiliation fields in PubMed. Geographic names are 
retrieved with the help of the gazetteer (a dictionary of geographic place names that binds names with 
coordinates) that is used for query expansion. When users specify search statements, their queries are 
expanded with geographic place names; the number of results for each location and location coordinates are 
recorded in a KML file which is later used for visualization. The gazetteer contains only place names of 
locations that are present in PubMed; it does not have place names that do not have any matching 
documents in PubMed. The gazetteer is stored in MySQL database, the visualization is based on Google 
Maps interface with some additional representations and interactions developed in PHP, JavaScript, and 
D3.js. The prototype uses a number of parallel searches to expedite the retrieval. 


3.1 Representations 


The template works with versions of LibreOffice and OpenOffice. Visualization of results in GeoPubMed has 
multiple representations. We use a heatmap of all publications in PubMed as a base layer; a layer with 
graduated circles to show search results. Density of the heatmap represents density of results for each 
location. Currently the heatmap coverage is limited to the US. To generate this layer we queried PubMed 
with BGN gazetteer that has 159,608 populated places for the US. This helped us identify 19,538 placenames 
mentioned in PubMed. Other placenames were not found in medical citations and for this reason are not 
used for retrieval. 

The results are distributed to multiple zoom levels that show countries, states/provinces, and 
smaller locations respectively. Each representation of results has an index of placenames, tightly coupled 
with the map. Both layers are shown in Figure 1. 
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Figure 1: Search results for “tuberculosis”. Results are distributed to multiple zoom levels and are shown 
against the heatmap. 
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Graduated symbols represent central points of locations (countries, regions, and smaller locations). Symbols 
are linked to a series of representations that show additional aspects of documents such as clinical facets 
(Fig. 2.a), ages of patients (Fig. 2.b), and years of publication (Fig. 2.c). 
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c) 
Figure 2: a) a bar chart showing clinical facets; b) a bar chart showing ages of patients described in 
medical citations; c) a column chart of years of publication. 


3.2 Interactions 


GeoPubMed has a number of interactions that increase the chance of discovery in the visualization. Besides 
searching, it supports different kinds of sorting in the location index, visual annotation of markers, 
interactive tours, linking, and zoom dependent selection. Map index can be sorted alphabetically and 
according to the number of search results. Markers on the map and locations in the index change their 
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colors as users open them up. This helps users to keep track of their previous actions. In addition, markers 
and locations in the index can be deleted or highlighted as special to aid users annotate relevant and less 
relevant documents. Selection at the level of countries allows users to select multiple countries and view 
them through the lens of additional representations similar to the ones linked to markers. This facilitates 
answers about geographic location which are not explicitly represented in an information system. For 
example, with this selection, the results from European countries can be grouped together and users could 
get access to treatments in Europe in the representation of clinical facets. Selection affects ranking of search 
results: ranking for individual countries is different from ranking for multiple countries where results from 


different countries are mixed. 


4 Knowledge Patterns 


The approach described in this poster offers a number of benefits to information seekers. It allows them 
to complete a number of analytical tasks and discover interesting patterns about medical knowledge while 
they search for information. First of all, distributing the results to multiple zoom levels shows cognitively 
plausible patterns of diseases, treatments, social networks, and diffusion of authors publishing in the same 
journal which allows users generate research hypotheses about etiology of diseases, medical practices, and 
awareness of diseases. Users can easily identify hot spots or deserts of medical research at various levels of 
geographic specificity. They can explore distribution of treatments and draw inferences about how some 
treatments are more popular in some countries than in others. Researchers can study international 
collaborations and their publishing productivity. Together all these knowledge patterns may lead to new 
scientific discoveries and better understanding of public health issues. 

Currently, a user study is underway evaluating the usability of knowledge patterns. The goal of the 
study is to assess merit of these patterns in analytical tasks and activities such as hypothesis generation, 


reasoning, spatio-temporal analyses and other activitie. 


5 Conclusion 


This paper demonstrates how merging geovisualization principles of multiple representations and rich 
interactions enhance tasks of information seekers. Specifically, their tasks become more analytical. These 
new tasks enable researchers not only to ask questions Where?, When?, Who?, and What? but also to 
discover and explore interesting knowledge patterns at various levels of geographic granularity (at the level 
of countries, provinces/states, and cities). These patterns may lead to new research questions, hypotheses, 


and new ways of thinking. 
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Abstract 

The cultural background of web users is argued to play a key role in the way they interact with and 
perceive the usability and usefulness of websites. The objective of this case study is to identify Arab 
users’ preferences and expectations of the design of Arabic websites in order to examine whether these 
preferences are consistent with their cultural-specific attributes as described and predicted by Hofstede’s 
model of cultural dimensions. Thirty three participants from two Arab countries evaluated and compared 
two websites, one from their own country and a second one from another Arab country within the same 
culture. The preliminary results suggest that they show an overall preference for a website developed for 
their country over another from other Arab countries, even if they share the same culture?. 
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1 Introduction 


One of the key objectives for any website is to enable its users to experience success and satisfaction 
(Fernandes, 1995; Nielsen, 2000). As such, it is argued that the accommodation of users’ attributes into the 
design process is essential for the usability and usefulness of the web (O’Connell & Murphy, 2007). The 
cultural background of users is considered one of the attributes that might affect users’ performance and 
satisfaction while interacting with websites (Shneiderman, 2000). The bulk of the research investigating this 
domain has employed Geert Hofstede’s cultural model (Hofstede, 1980, 2001), based mainly on the 
interpretation of Marcus and Gould (2000). In his model, Hofstede assigned comparative scores for 50 
individual countries and three regions on five cultural dimensions. These dimensions comprise: Power 
Distance-the extent to which the less powerful members of societies expect and accept that power is 
distributed unequally; Individualism-the extent to which individuals are integrated into groups; 
Masculinity /Femininity-assertiveness and competitiveness versus modesty and caring; Uncertainty 
Avoidance-intolerance for uncertainty and ambiguity; and Long-/Short-Term Orientation-the degree of 
future vs. historic orientation of the culture. 

In the case of the three regions, one of which is the Arabic-speaking region, several countries had 
been grouped together based on the assumption of having similar cultural traits. The Arabic-speaking region 
comprised Egypt, Lebanon, Libya, Kuwait, Iraq, Saudi Arabia and the United Arab Emirates. This group 
has been frequently applied in cross-cultural interface design studies to different extents (Barber & Badre, 
1998; Callahan, 2007; Zahir, Dobing, & Hunter, 2002). However, grouping the seven countries into one 
group, excluding the rest of Arab countries, and assuming that Arab users have similar needs, expectations, 
and preferences on the web, without acknowledging possible individual differences across countries, can 
create potential problems in Arabic interface localization. Therefore, the main objective of this study is to 
investigate whether Arab users have similar expectations and preferences when it comes to web interfaces, 


' Part of this study was published in Khashman, N. & Large, A. (2013). Arabic website design: User evaluation from a cultural 
perspective. In Rau, P.P.L (Ed.): Cross-Cultural Design. Cultural Differences in Everyday Life, LNCS 8024, pp. 424-431. doi: 
10.1007/978-3-642-39137-8__47 
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considering that Arab countries are treated as one composit group in cross-cultural web analysis. The first 
part of the study, which is reported here, asked users from Jordan (i.e. excluded from Hofstede’s model) to 
compare and evaluate a website from their own country with another from Lebanon (i.e. included in the 
model). The second part asked users from Lebanon to evaluate the same two websites used in the first part. 


2 Methodology 


2.1 Participants 


Native-Arabic internet users living in Jordan and Lebanon are the population for this study, where 20 
Jordanians (first group) and 13 Lebanese (second group) ranging from 20-65 years old took part in 
evaluating the design of two Arabic websites. These participants were recruited by the researcher through 
personal contacts in two cities in Jordan and Lebanon. The study was in compliance with all ethical 
guidelines used by the researcher’s university. 


2.2 Websites and Tasks 


A random government website was chosen from Jordan’s e-Gorvernment portal and then matched with the 
equivalent website from Lebanon. Using the Ministry of Health’s websites in these two countries, each 
participant was handed a paper with questions asking her/him to find a list of public hospitals and contact 
information for each ministry. These tasks were chosen because they were consistent across the two websites, 
and because they require the participants to perform some searching and browsing through the websites to 
find the answers. The participants also filled out a pre-task questionnaire for the demographic information, 
and a post-task questionnaire that inquired about what the participant thought of the design of these two 
websites. The general questions included in this questionnaire were adapted from Marcus and Alexander 
(2007); for example, how would the participants describe the imagery of the websites, would these websites 
appeal to people from their country, and what content is missing. The more specific questions about design 
elements such as keyword searching and customization were derived by the researchers based on previous 
work (Khashman & Large, 2011, 2012). 

This study poses a number of limitations that might have affected the results and should be taken 
into consideration for future research. First, there were 20 Jordanian participants who took part in the first 
stage of the evaluation process, a number not matched by the 13 Lebanese participants who participated 
in the second stage of the study; a higher number might be needed to calculate significant statistical 
differences. Secondly, the evaluation involved just two government websites, which could have influenced 
the design of the website, and hence users’ perception of this design. Thirdly, due to internet connection 
problems and several power failures in Lebanon, participants’ searching times were dropped in favor of their 
subjective satisfaction of the websites. Future research will take these limitations into consideration and 


investigate users’ preferences and expectations in regards to Hofstede’s model. 


2.3 Analysis 


Statistical analyses were performed using SPSS program based on the specific level of measurement for each 
variable for the quantitative aspect of this study. Descriptive statistics were used to describe continuous 
variables such as time for searching and completing tasks. 


3 Findings 

The preliminary results from the first group of participants show that out of 80 tasks that were undertaken 
on the two websites, only four were unsuccessful divided equally between the two websites. For the 
remaining successful tasks, the participants performed the search and found the answers relatively faster on 
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the Jordanian website, although the difference between the two websites was not statistically significant, 
and despite that the participants thought that their performance on both websites did not differ much. 

On a 5-point Likert scale, 1 being the lowest and 5 the highest, the participants were asked how 
much they liked the design of both websites. For the first group, the website from their country scored 
slightly more than the Lebanese one with 3.60 and 3.25 respectively. The same case was for the second 
group, as they liked the design of the website designed for their own country compared to the other from 
Jordan, with 3.38 and 3.23 respectively. 

A significant majority of the participants from the first group preferred to use Jordan’s website ( 
x°(1)=5.00, p=.025) and found it easier to use ( x ?(1)=9.80, p=.002), but they did not think it was much 
faster to use (x ?(1)=3.20, p=.074) which was confirmed by the actual results of search times. For these 
three categories, the results from the second group were not significantly different. 

The second part of the post-task questionnaire asked the participants to rank on the Likert scale 
how important was it for them to have specific design elements on any website, these elements were 
previously identified as culturally specific markers (Barber & Badre, 1998). As shown in Table 1, 
participants thought it was very important to have keyword searching, a supporting second language version 
of the website, and site map, while customization was less important for the first group and keyword 
searching was less important for the second group. When asked about having restricted sections on websites 
to access information 79% of all participants did not prefer to have such feature on either website. Table 2 
provides the results of image preferences for the first and the second group, reporting the highest percentage 


between the categories. 


First Group (Jo) Second Group (Leb) 


Element 
M SD M SD 
Customization 3.70 1.34 4.08 .86 
Second Language 4.60 99 4.69 63 
Keyword Searching 4.75 72 4.54 97 
Site map 4.25 1.16 4.23 .83 
Animated Images 3.25 1.45 3.23 73 


Table 1: Importance of specific design elements 


Number of PPI Gender Status 
Element (Group vs. Indv.) (Men vs. Women) (Officials vs. Citizens) 
Jo Leb Jo Leb Jo Leb 
; 40% Group 35% Group 65% 65% 35% Neither 35% Neither 
First group (Jo) i ; 
Mix Mix 
69% Group 69% Group 61.5% 61.5% 46% Citizens 46% Citizens 
Second group (Leb) Mi Mi 
ix ix 


Table 2: Image preferences 


4 Conclusion 


The 33 participants compared and evaluated two websites from two Arab countries, Jordan and Lebanon. 
The main objective of this study was to explore how people from a country that is excluded from Hofstede’s 
model (i.e. Jordan) perceive and compare a website from their country with another one from a country 
that is included in the model (i.e. Lebanon), and vice versa. This is one part of a larger study that aims to 
explore whether users expectations and preferences of Arabic web design match the Arabic cultural-specific 
attributes which are described and predicted by Hofstede’s model of culture. The findings from this small 
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study suggest that users show an overall preference for a website developed for their country over another 
from other Arab countries even if they share the same culture. 

Arab countries have been occasionally treated as one entity in web design, just as in Hofstede’s 
cultural model, which could explain why many international corporations and NGOs, among other 
institutions, tend to have one global website for Arab-speaking countries. If websites were localized by 
adjusting cultural design elements to conform to the target audience in each Arab country, it would help 
corporations create global demands, and establish a reliable, professional and international image online 
(Sun, 2001). 

Therefore, the findings might indicate that country variations within the same Arabic culture 
should be taken into consideration when designing websites for that culture, whether it is a global, 
international, or a local design. For example, customization was found to be more important for the 
Lebanese participants, which means they prefer to have site customization tools like changing the font size 
or the background color. While Jordanian participants indicated the need to have keyword searching more 
than their Lebanese counterparts. These preferences need to be taken into consideration from the early 
stages of the design process to match those particular countries. 

The findings also suggest a need for a more comprehensive study underlining not only the 
similarities, but also the differences between Arabic web interfaces based on the design characteristics 
inferred from Hofstede’s model. The results from this study could also be used to improve the design of 
Arabic government websites in accordance with the cultural markers associated with their culture, whether 


as described by Hofstede or not. 
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Abstract 

Scholars from numerous disciplines rely on collections of texts to support research activities. On this 
diverse and interdisciplinary frontier of digital scholarship, libraries and information institutions must 1) 
prepare to support research using large collections of digitized texts, and 2) understand the different 
methods of analysis being applied to the collections of digitized text across disciplines. The HathiTrust 
Research Center’s Workset Creation for Scholarly Analysis (WCSA) project conducted a series of focus 
groups and interviews to analyze and understand the scholarly practices of researchers that use large- 
scale, digital text corpora. This poster presents preliminary findings from that study, which offers early 
insights into user requirements for scholarly research with textual corpora. 
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1 Introduction 


To answer research questions about topics ranging from literary form to language and culture, humanities 
researchers may work with large numbers of complete volumes or smaller, hand-selected sets. While some 
researchers analyze the base texts, others interpret features derived from them. 

Libraries and cultural-memory institutions must prepare to support research using large collections 
of digitized texts for analysis, or corpora, and need to understand the different methods of analysis applied 
to corpora across disciplines. The HathiTrust Research Center’s Workset Creation for Scholarly Analysis: 
Prototyping Project (WCSA) conducted a series of focus groups and interviews to understand the scholarly 
practices of researchers using large-scale, digitized text corpora. 

The HathiTrust (HT), a repository of over 10 million volumes (3 billion pages) of text, serves as a 
type of corpus: an expansive aggregation of distributed sources from which related sources may be 
concentrated by researchers into densely thematic bodies of evidence.' This aggregation consists of not just 
its primary constituents (books), but also the bibliographic metadata, and even intra-book content, such as 
formal sections, captioned images, maps and charts, and indexes. The HathiTrust Research Center (HTRC) 
is the research branch of the HathiTrust.2 The HTRC offers a suite of tools and services, which enable 
computational access to the HT corpus. From digitized library collections in HT, scholars select subsets for 


' http://www.hathitrust.org/htre/ 
? http: //www.hathitrust.org/ 
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computational analysis according to their particular research objectives. We refer to these subsets, along 
with associated, external data sources, as “worksets.” Worksets are a type of machine-actionable, referential 
research collection. User requirements for workset creation grow increasingly sophisticated and complex as 
humanities scholarship becomes more interdisciplinary and more digitally-oriented over time. 

HTRC holds transformative promise for humanities scholarship: it seeks to enable scholars to sift 
through a massive corpus and to construct therefrom precise worksets for investigation. How scholars use 
collections and worksets remains a central research question in this initiative. Under the auspices of the 
HTRC, the WCSA team conducted a series of focus groups and interviews investigating how to facilitate 
scholarly selection of digital research materials. 

WCSA is a two-year effort, funded by the Andrew W. Mellon Foundation, which aims to engage 
scholars in designing tools for exploration, location, and analytic grouping of materials so that they can 
routinely conduct computational scholarship. The three major goals of the WCSA project are to 1) enrich 
the metadata in the HT corpus, 2) improve access and discovery through reference-able metadata, and 3) 
formalize the notion of collections and worksets in the context of the HTRC. This study gathers qualitative 
data on scholarly practices with text corpora to inform the development of tools and services for HTRC. 


2 Background 


The use of digitized, primary source materials is growing in value and prominence among humanities 
scholars (Brogan, 2006; Palmer, 2005). In addition, the act of bringing together related information from 
various kinds of collections is essential to their research processes (Warwick, et al., 2008; Sukovic, 2008; 
Sukovic, 2011). In the course of their work, researchers create their own “digital aggregations of primary 
sources and related materials that support research on a theme” (Palmer, 2004). 

Scholars rely on collections of texts to support research activities across numerous disciplines, 
ranging from physics and public health to English and computer science (Underwood, 2013; Argamon, et 
al., 2009; Heuser & Le-Khac, 2012; Moretti, 2009; Petersen et al., 2012). In certain domains, scholars create 
personal, digital carrels, gathering subsets of texts amenable to in-depth analysis using advanced tools and 
services (Mueller, 2010). Research collections comprise a variety of media and formats, which together 
function as a coherent collection of interwoven content and context (Brockman, et al., 2001). Previously 
identified scholarly needs for conducting research with digital collections include the need for bibliographic 
and evaluative tools for building thematic, curated sub-collections, infrastructure for ensuring sustainable 
and well-prioritized digitization of materials, and digital collections that cover a breadth of temporal and 
geographic areas (Palmer, et al., 2010; Proffitt, et al., 2008; Meyers 2010, 2011; Sinn, 2012). 

Scholars also play a critical role in shaping how librarians and information scientists formalize 
collections to support research activities. A 2010 Council on Library and Information Resources (CLIR) 
report warned: 


While a greater reliance and dependency on digital resources is inevitable, the quality of the data 
and their organization and accessibility in service to teaching and scholarship are major concerns. 
Without the guiding voice of scholars, the tremendous effort now being devoted to digitizing our 
cultural heritage could in fact impede, not facilitate, future research. (CLIR, 2010). 


In 2011, the Center for Informatics Research in Science and Scholarship at UIUC surveyed digital 
humanities scholars who were awarded Google Digital Humanities Awards and given large-scale text corpora 
from Google Books for their research projects. Among the major challenges and areas of need identified in 
the study’s findings were 1) identifying and retrieving materials and 2) identifying characteristics of textual 
content. The authors noted: 


Researchers do not necessarily need huge sets of data to do interesting work, but the implication is 
that they do need flexible data delivery services that can deliver different kinds of data in different 
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formats based on different searches for different kinds of research at different times. (Varvel & 
Thomer, 2011) 


Developing such flexible services requires ongoing inquiry into the research practices of specific disciplines 
working with these sources, including investigation into the types of research questions posed by scholars 
and the types of analytical methods employed. 


3 Methods 


This study addresses the research question: How do researchers, especially humanities scholars, use 
collections in the course of their research, particularly in the context of textual corpora? The WCSA team 
collected data through semi-structured focus groups and interviews, which targeted researchers in the 
humanities and others working with digital collections. 

Participants were asked about how they identify, select, and obtain access to texts for inclusion in 
analysis; transformation and pre-processing steps; units of analysis (works, manifestations, pages, n-grams 
OCR, images, etc.); methods of analysis; problems encountered in obtaining text corpora and materials not 
currently existing in digital form; and challenges to working with these digital collections (e.g., OCR quality, 
duplication). 

Focus groups and interviews were conducted at the Digital Humanities 2013 conference, the 2013 
Joint Conference on Digital Libraries, and the 2013 HTRC UnCamp. Thirteen individuals participated in 
the focus groups and five scholars were interviewed, for a total of eighteen participants in the study thus 
far. 

Focus group and interview recordings were transcribed, and transcriptions are being manually coded 
to identify emergent themes. Each transcription is coded multiple times to ensure inter-coder reliability. 
Further content analysis is ongoing. 


4 Preliminary Results and Discussion 


Participants included junior and senior faculty at liberal arts colleges and universities, computer 
programmers, librarians, data scientists, academic technologists, and graduate students. Scholars were 
specialists in English literature, classics, linguistics, library and information science, and history. 
Participants were affiliated with academic institutions located around the world, including Great Britain, 
Singapore, Germany, France, and different regions of the United States. 

A set of key themes have emerged from preliminary analysis. The following three examples illustrate 
the roles of collections; the need to implement granular, actionable units of analysis; and the importance of 
expert-enriched, shareable metadata. 

1) Researchers consider the processes of collecting and workset-building to be basic scholarly 
activities. Researchers collect on the bases of diverse criteria, but aim for exhaustiveness within defined 
analytic constraints: for example, complete representation of a genre over some period of time, complete 
representation of the works by a demographic, or a complete lexicon of some language, in print, for a certain 
time period (Figure 1). 
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“collection-building is scholarly activity... we also need to 
think about how to document not just the status of different 
versions but also the labor that goes into and the kinds of 
knowledge that go into the decisions in making a collection, 
and the knowledge that’s gained from that process.” 
“Today it is viewed as something very technical to prepare a 
corpus. But | think it’s getting more and more... interesting 
to do. And one day, it will be unrelated to technical stuff, 
and it will get closer to something of value.” 


“the valorization of corpus-building...The 
recognition at the scientific level” 


“l'm learning a lot through this organizing of my material and it’s informing 
what will be the main argument of my research” 


“[If] | have a corpus and nobody is allowed to see it but wonderful things 
come out of it.. That’s not really research.. We are tying to get 
accountability for the kind of work we are doing. And it’s important for us to 
show the basis of our work.” 


Figure 1: Selected focus group and interview excerpts on collection- and workset-building. 


Some noted that, although the process of collecting and workset-building involves intellectually rigorous 
labor involving careful and refined analysis, its value may go unacknowledged by the scholarly community. 
There may be an interesting analogy to be made here with the often under-recognized intellectual work 
invested in the preparation of scholarly, edited collections of texts (Fraistat, 2012). This is an activity that 
is increasingly likely to become a species of workset creation, as the workflows (for preparing both print 
and scholarly editions) become more and more digitized over time. 

2) Researchers desire that collections, worksets, texts, and other objects of analysis be highly 
divisible, and that resultant pieces be identifiable, movable, and readily associable with highly granular 
metadata--what Mueller calls “re-diggable and multiply recombinable data” (Mueller, 2012). Participants 
described a range of targets for analysis: full authorial oeuvres, individual novels, pages and page images, 
word tokens, n-grams (possibly tagged with parts of speech), poems within books, notions or themes, 
characters, encoded TEI elements, and lexicons, and more. They want to move subsets of worksets, or 
different logical or syntactic pieces, of their data between tools, collections, processes, formats, and 
standards, and to track them throughout (Figure 2). 
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“we need ways to slice this book. So we need to slice it 
by page...We need to slice it by poem, which doesn’t 
conveniently overlap or match the page boundaries. We 
potentially need to slice it by sections within a poem...” 


“they use a lot of corpus configurations, like subcorpora. Subcorpus 
building... And partitions-building. Partition is to slice the corpus in 
parts, the sum of which is the whole. So this is for contrastive 
analysis” 


“Books are often not interesting without 
knowledge of the logical works or units 
within...” 


“that’s a whole different dicing intellectually ... Being able to 
support the huge variety of those kinds of ways of thinking 
about [texts] at that logical level is a bit challenging. But | think 
it’s one that somehow has to be approached...” 


“We have words, text units, and intermediate structure. Those 
three levels hold different types of properties” 


Figure 2. Selected focus group and interview excerpts on divisibility and objects of analysis. 


The HathiTrust corpus is arguably better poised to support such a service than other existing repositories 
such as the Google Books corpus, for two reasons: first, because the HathiTrust, with its strong roots in 
research libraries, is more oriented towards serving the scholarly community than Google, the latter being 
more oriented towards serving a wider constituency in which the general public is a more significant 
component; secondly, Google, with its roots in information retrieval, is oriented towards searching for and 
retrieving information globally (its motto is “to organize the world’s information”), but not specifically 
oriented towards differentiation or sectioning of search spaces, which is what the idea of worksets is focused 
on. This emphasis on differentiated and sectioned spaces that distinguishes WCSA is particularly relevant 
to the humanities. Unlike the sciences, which seek to discover universal and immutable laws of nature, the 
humanities are typically concerned with contextual relationships, that is, the relationships that obtain 
between texts in relation to a specific, situated, context of inquiry. This is likely to continue to remain true 
when the methods applied to humanistic inquiry are digital, as noted by Matthew Jockers (Jockers, 2013). 
Another literary scholar, Andrew Piper, makes a similar point about the needs that are left unmet by, 
paradoxically, that very globality, and lack of fine-grained differentiability, and, in particular, the lack of 
personalized sectioning/slicing of the search space in the Google Books affordances (Piper, 2013). 

3) Researchers critically need more and better metadata, beyond conventional bibliographic 
metadata, for multiple aspects of the scholarly research process—from precise retrieval of texts to defining 
units of analysis. Participants noted a common desire to share their expert-created or -enriched metadata 
more broadly, much as they would disseminate results of uniquely created analytic work. Participants also 
expressed interest in collaborative, curatorial work on texts themselves, such as to edit, encode, or enrich 


the outputs of digitization (Figure 3). 
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“The book is not a unit of great interest — you want all the poems that aren’t 
listed in the metadata. The metadata from the library is very coarse, 
especially in respect to the goal you have. There’s no opportunity for the 
experts to provide the deep metadata to share in the broad infrastructure 


that librarians do very well.” 


“Collaborative curation... You could create 
the data collaboratively, and then explore 
them collaboratively” 


“one thing is getting the data out. But then the next step is, you’ve done 
all this work, and you then have the authoritative metadata. You have the 
best metadata in the world, and no one will take that from you. Because 


it has not been blessed.” 


“it would be very important to have the ability to say 
[of the metadata], this is wrong ...having a workflow 
which supports that would be important. So the whole 
idea of social addition comes really into play here. 


Figure 3: Selected focus group and interview excerpts on metadata enrichment and sharing. 


5 Conclusion 


Based on preliminary analysis, participants’ responses indicate the need for formalized workset protocols 


that allow scholars to identify, select, and pull together subsets of texts within massive corpora. Ongoing 


data analysis will inform development of tools and services for HTRC, and best practices for other large- 


scale corpora. The study of user requirements for digital collections is critical to meeting the needs for rising 


levels of scholarly research with digital materials. 
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The project assessed approaches to creating an alternate reality game (ARG) for students to learn 
baseline concepts and skills of informatics in the introductory informatics course at the University of 
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1 Research Question 

Our research question is inspired by personal experience with the difficulty of creating engaging assignments 
through which informatics students can learn to apply theoretical frameworks in information science, 
thereby gaining valuable skills for their future professional lives as information systems designers and 
evaluators (Bachelor of Science in Informatics: Careers, n.d.). Our question is as follows: How could an 
ARG be deployed as part of an undergraduate informatics course to give students engaging, hands-on 
learning experiences and lasting professional skills using the concepts covered in the curriculum? 


2 Alternate Reality Games (ARGs) 


An ARG does not take place strictly in the real or virtual world. The “real world” is generally comprised 
of spaces in the physical world, and the “virtual world” includes sites on the World Wide Web, e-mail, 
Internet forums, or instant messaging. An ARG incents participants to construct a narrative, which 
describes an alternate reality spanning the real and virtual worlds. Game players interact with the game 
designers or “puppetmasters” (McGonigal, 2007) through direct communication with in-game characters 
(e.g., via email or text). These interactions connect the real and virtual worlds, and entice players to learn 
about the narrative underlying the individual tasks to move game play forward. 

Game players are led into the “alternate reality” by unlocking clues and riddles embedded in online 
content, such as images and audio files on websites, or mysterious URLs shared in public spaces (McGonigal, 
2007). ARG players might not even be certain when they are or are not in the game since the “magic circle” 
of an ARG does not have clearly marked boundaries — as with a sport court or board game — separating it 
from the world outside the game (Huizinga, 1955; Jonsson et al., 2006). ARGs, therefore, are sometimes 
spoken of as “pervasive games” (McGonigal, 2003). 
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3 Games and ARGs in Learning 


Existing literature about learning describes a range of game integration, ranging from superficial 
“gamification” of traditional learning practices to harnessing educational connections in youths’ independent 
play of commercial video games outside school (Gee, 2003; Tobias and Fletcher, 2011). Video game 
enthusiasts have also suggested possible ARG applications in the classroom (Penny Arcade, 2012). However, 
the use of ARGs or other pervasive games to supplement or support learning in formal contexts has not yet 
been well explored. 

There has been increased interest in “flipped classroom” and other uses of new media methods, in 
which students watch videos or play games to prepare for in-class discussion and work. However, the use 
of ARGs to blend in-class and out-of-class experiences is still very new. A few studies have indicated that 
ARGs may be effective ways to support learning. For example, Connolly, Stansfield, and Hainey (2011) 
examined the use of an ARG to support the learning of foreign languages and Whitton (2009) describes an 
ARG piloted to help incoming undergraduates at a UK university become oriented to their new city, 
campus, and library resources. ARGs also have the potential to improve learning by affording students an 
opportunity to process course content and consider how to apply it (Bransford et al., 2000:58ff). 


4 Methods and Results 


We used three methods of data collection to gather information about the design domains from stakeholders. 
First, we interviewed the instructor of the informatics course. The instructor reviewed the syllabus, 
indicating subjects that might be appropriate for an ARG experience, such as Value Sensitive Design 
(Friedman et al., 2008) or information security. He also described current technology used in the course, 
such as in-class engagement using a chat room backchannel, and the learning management system for 
delivering course content. 

Next, we conducted a focus group with four undergraduate teaching assistants (TAs) who had 
worked on the course, using a short participatory design activity to brainstorm ideas for game storylines or 
topics in the course that would benefit from a gaming application. The TA feedback indicated a preference 
for a focused game experience, covering only one of the syllabus topics, and they suggested Value Sensitive 
Design as a possible syllabus topic for the ARG experience. Finally, TAs discussed opinions about incentives 
for playing the game, coming to a consensus that the game should be a voluntary experience, and offer 
rewards related to classroom performance (extra credit) as well as indicate mastery within the game (e.g., 
showing a “leader board” or ranks for contributing players). 

Finally, we designed a brief online survey to distribute to former students in the course. We emailed 
a link to the undergraduate association listserv and received 49 responses, gathering information about 
existing sites and tools students use to collaborate in and out of the classroom. The students’ responses 
showed a strong preference for using collaborative tools such as Google documents (for production) and 
Facebook (for discussion). Students also answered questions about game play, and indicated they undertook 
a range of roles in collaborative game play, such as leader, organizer, or contributor. We also asked students 
about their favorite games, and the results revealed a strong preference for both puzzle and role-playing 
games. 


5 Theoretical Perspective 
Several design issues emerged during our data collection. We found Schon’s (1991) concept of design domains 


helpful in considering the following topics: 


e Structure: Existing socio-technical systems would influence how students interact with TAs, the 
instructor, and each other. Incorporating digital tools that students already use would reduce 
barriers to participation. 
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e Learning Objectives: We needed to complement the existing syllabus and course activities, setting 
out clear goals for student learning through participation in the ARG. 

e Resource Investment: Resources allocated for designing, running, and evaluating game play should 
not detract from the administration of the course. 

e User experience: The technology tools and platforms used in the game should allow students to 
collaborate to complete game-related tasks. In addition, story line elements and character 
interactions students experience shall not be distressing in any way. 


Using these design domains to analyze the data gathered from the instructor, TAs, and former course 
participants, we prepared an initial outline for a pilot iteration of the game. 


6 Game Outline 


Based on data gathered from the instructor, TAs, and former students, as well as design domains outlined 
above, we suggested the following elements for the design for the ARG experience in the informatics course: 


6.1 Design domain: Structure 


Game duration was set at two weeks based on instructor preference and feedback from TAs. The two-week 
period will include four distinct “levels” of task completion. 

Virtual spaces used will include a game content website, the course website, as well as Google 
documents, in which students can collaborate and complete information synthesis tasks to advance game 
play. 


6.2 Design domain: Learning objectives 


The syllabus topic chosen for the focus of the game is Value-Sensitive Design (VSD; Friedman et al., 2008), 
and the game tasks will give students applied practice using VSD for a specific design problem. 

Applied practice is emphasized in the game design. VSD is a subject students will learn with more 
mastery in an experience that allows them to create, rather than consume, content (Gee, 2003). The VSD 
topic also occurs early in the academic quarter for this course, before midterms, thereby increasing the 
capacity for students to participate in game play. 


6.3 Design domain: Resource investment 


The game’s puppetmaster will be played by a graduate teaching assistant. This TA will run the game 
website, track student participation, and interact with students as the main game characters. 

The course instructor will direct students to the virtual help sources in the game if they ask for 
help, experiencing minimal additional work in running the game. 

Undergraduate TAs for the course will monitor student discussions on the course website and send 
any relevant questions or content to the game puppetmaster. 


6.4 Design domain: User experience 


Participation in the game will be encouraged, but voluntary. Incentives for participation in the game will 
include public (to the course website) player rankings that will indicate contribution and mastery of the 
subject material. Extra credit will also be awarded for participation. 

Two characters will guide the students in completing game tasks: Dorothea, an alien being, and 
Professor Ren, a fictional information science professor. We chose a female and a male character to include 
all students equally. 

The entry point (a URL) to the game narrative would appear on both the course website and 
during class on lecture slides, leading students to the website of the first character, Professor Ren. 
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The narrative will begin with Professor Ren asking students for help in designing a communication 
tool for a lost and wandering alien, who calls herself Dorothea. The game is divided into four major 
components. First, students assemble a list of existing communication tools and organize them by 
capabilities, to Professor Ren’s satisfaction. Second, Professor Ren will introduce VSD to the players 
concurrent with the VSD lecture topic in class; the character will then assist students in making a list of 
questions to ask Dorothea about her communication values, based on the VSD framework. The third and 
fourth components of game involve synthesizing Dorothea’s user requirements and values into a solution 
for her. 

Success in the ARG will be tracked by the game’s puppetmaster, and the Professor Ren character 
will use his in-game “authority” to rank players publicly, giving students a feeling of achievement and 
mastery. 


7 Future Steps 


We plan to give students the opportunity to gain a feeling of mastery in Value-Sensitive Design, an 
important theoretical concept that they will have applied to a practical problem through game play. We 
intend to run a first iteration of the game as proposed in this paper. Player and student feedback will be 
welcomed as part of the game’s evaluation, and we hope to implement an engaging experience to this course 
to enhance student learning in the future. In the future, we will share the elements of the game that prove 
more or less successful, with the intent of inspiring more widespread incorporation of gaming experiences in 
the postsecondary classroom. 
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Abstract 

There is ample evidence of the influence of individual differences on information-seeking behaviours. 
Trailways and paths are increasingly important objects to support internet navigation. The EU-funded 
PATHS (Personalised Access to Cultural Heritage) project is investigating ways of assisting users with 
exploring a large collection of cultural heritage material taken from Europeana, the European aggregator 
for museums, archives, libraries, and galleries. A prototype system has been developed that includes 
innovative functionality for exploring the collection based on Google map-style interfaces, data-driven 
taxonomies, and supporting the manual creation of guided tours or paths along with the use of 
personalised (and nonpersonalised) recommendations to promote information discovery. After analysing 
the paths created by participants during an extended user evaluation, this paper discusses the effect of 
individual differences on path creation and characteristics. 
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1 Introduction 


As the amount of information available through the internet grows and its complexity increases, so too does 
the necessity of helping users navigate the cultural heritage information space (Brenner & Mihalega, 2006). 
Traditional information retrieval behaviours may be appropriate for domain experts who are performing 
known-item searches (Sutcliffe & Ennis, 1998), but novice users need guidance and assistance to achieve 
their information goals. Walden’s Paths was the first system to offer manually curated paths through a 
digital collection (Shipman et al., 1996). Based on a user requirements analysis (Goodale et al., 2011), the 
PATHS! system has been developed to support a number of activities to help users make sense of 
Europeana,? including path creation by expert and non-expert users, path facilitation by teachers and 
cultural heritage educators, and path consumption by students and visitors. 

In this paper we present an initial analysis of the paths that have been created with the second 
prototype of the PATHS system. Based on feedback from the first prototype (Fernie et al., 2012), the paths 
editing functionality was expanded, allowing users to create branching and complex paths. The question 
that we address here is thus whether people use the more updated functionality and if so, then how this 
impacts the paths they create. 


1 http://www.paths-project.eu/ 
? http://europeana.eu/ 
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2 Methodology 


2.1 Sample 


Participants were selected by a non-probability convenience sampling method (Bryman, 2012). The main 
body of participants was recruited on a convenience basis via university staff and student volunteer email 
lists; additional expert participants were recruited on an ad hoc basis through existing contacts known to 
the evaluation team. 

In total, 34 participants (19 women) completed the full evaluation protocol. Of these participants, 
10 were classified as domain or subject experts. The other 24 were classified as non-experts (novices). 
Participants also rated their level of internet experience on a four-point scale: Advanced (74%), Intermediate 
(24%), Basic (2%), and No experience (0%). Participants’ ages ranged from 18-25 years (23.5%), 26-35 
(23.5%), 36-50 (23.5%), 51-65 (23.5%), to over 65 years (5.9%). 


2.2 Study design 


To investigate this study’s research question, an experiment was conducted in which participants were 
asked to use the PATHS system under controlled laboratory circumstances. During the evaluation, 
participants were asked to complete five short navigational and information-seeking tasks to familiarise 
themselves with the mechanics of the system, including finding and following paths, and finding and 
collecting individual items. The main task (30 minutes) was a creative and exploratory simulated work task, 
informed by the Interactive IR evaluation framework (Borlund 2003): participants were asked to create a 
path based on a historical or art-focussed topic in order to stimulate discussion and to encourage further 
use of cultural heritage resources. 

Participants subsequently completed an online feedback questionnaire and were interviewed on a 
semi-structured basis (15-30 minutes) about their experience. All of the data collection instruments are 
available as appendices in Griffiths et al. (forthcoming). 


3 Results 


3.1 Path Structure 


All of the paths created by participants were manually classified into three types, depending on the nature 
of their structure. Linear paths (24%) have at most one branching node, which is defined as a place where 
a user could follow two items from a single item. Branching paths (29%) have two or more instances of 
branching nodes. Complex branching paths (47%) have at least one instance of a branching node off of a 
branching node. Examples of all of the types of paths created by participants are shown in figures 1 to 4. 


Figure 1: Example of a Linear path: Horizontal 


1090 


iConference 2014 Jen Smith et al. 


Figure 2: Example of a Linear path: Vertical 


| 


Figure 3: Example of a Branching path 


z 


a 


Figure 4: Example of a Complex Branching path 


The use of branching hierarchical structures in the path allowed for more complex narratives to be 
constructed, and 23% of paths were ordered by narrative or story. Other organisational schema included 
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thematically (50%), chronologically (9%), by location (6%), “importance” of items (3%), and no particular 
order (6%). 


3.2 Age 


As the age of participants increased, they tended to create simpler and more linear paths. No participants 
under age 25 created linear paths, but 25% of participants aged 26-65 years and all participants older than 
65 years created linear paths. Participants aged 18-25 also had the highest percentage of complex branching 
paths (62.5%). Furthermore, age is negatively associated with both the total number of nodes participants 
included in their paths (r = -.38, p = .029) and the number of titles they changed (r = -.38, p = .028). 


3.3. Gender 


Overall, female participants created more linear (26%) and branching (32%) paths than complex branching 
(42%) paths, while male participants created fewer linear (20%) and branching (27%) paths than complex 
branching (53%) paths. We also found that women added a greater number of descriptions (approximately 


40% more) to individual nodes than men. 


3.4 Internet experience and domain-specific knowledge 

As might be expected, the more experienced with using the internet participants were, the more likely they 
were to add text nodes (an aspect of PATHS functionality that is relatively non-obvious). No users with 
basic internet experience added text nodes, but 29% of intermediate and 46% of advanced users did. Further, 
only advanced internet users included “composite” nodes in their path. A standard path node consists of a 
single item; composite nodes are created when an entire page of search results or thesaurus topic items is 
added as a whole to a user’s workspace. No domain experts used these information-rich but specificity-poor 
“composite” nodes. Figure 5 shows a standard path node; note the rich metadata in the “About the original 
item” section. Figure 6 shows a composite path node. 
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Dragons’ Gate VADS Collection: Public Monuments a... 


Thesaurus Tags Map 


Providers Dragons” Gate VADS Collection: Public Monuments and Sculpture Association 
OltureGrid Of course, Europe also has a rich tradition of dragons. These two 
different takes on dragons came together when people from China 
n started to move to the UK and elsewhere in Europe. 
Contributors 


No contributors found 


Categories 
Node 


Physical Object 


Previous page Next page: Shield wit... 


Keywords 


About the original item 
No keywords found 


Title Country Language 
Dragons’ Gate VADS United Kingdom English 
Collection: Public Monuments 

and Sculpture Association 


Category Provider Rights 

Physical Object CultureGrid Rights Owner: Public 
Monuments and Sculpture 
Association 


View at Source 
@ http://www.vads.ac.uk/lar. 
Paths is not responsible for the 


content of external Internet 
sites 


Comments 


You can add a comment if you log in. 


Figure 5: Example of a standard path node. 


Arts and Crafts Movement 


Thesaurus 


Providers Arts and Crafts Movement 


No providers found Add your own description here 


aaa aA 
Contributors au’ Aa 


No contributors found 
aaa aA 


Categories a a2 a A 


No categories found View this topic in the 
thesaurus 


Keywords 


No keywords found Previous page This path ends here 


Comments 


You can add a comment if you log in. 


Figure 6: Example of a composite node (based on a thesaurus topic) 
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4 Discussion 

It seems that age, gender, internet experience, and domain knowledge all have a role to play in 
understanding how people use the PATHS system and create trails or paths. Table 1 shows which user 
characteristics have shown an influence on path creation behaviours. 


Age Gender Internet Domain 
experience novice /expert 
Path structure X X 
No. of nodes X 
No. of titles changed X 
No. of descriptions added X 
No. of text nodes added X 
No. of composite nodes X X 


Table 1: User characteristics that influence path creation behaviours 


Given the system’s computer-based nature, it is unsurprising that older participants tended to create simpler 
and less feature-rich paths. Age of user could be a key concern when PATHS moves beyond the prototype 
stage. Similarly, it was observed that more advanced internet users tended to include more complex nodes 
(both textual nodes and composite nodes). Perhaps because they reflect a lack of discernment, composite 
nodes, which include much immaterial information, were spurned by expert users. 

Gender seemed to be related to two PATHS behaviours: adding descriptions and structuring paths. 
First, women added more descriptions to their nodes than did men. Second, men created proportionally 
fewer linear and simple branching paths than women, but proportionally more complex branching paths. 
This difference may reflect a fundamental psychological distinction between men and women. Systemising 
is an individual-difference dimension defined as the drive to analyse or construct systematic relationships 
in non-social domains (Baron-Cohen et al., 2003). Men have consistently been shown to score higher on this 
dimension than women, which has been conceptually linked to the degree to which people engage with 
activities such as car repair or computing. Baron-Cohen et al. have also suggested that it is associated with 
the desire to build and perfect collections of items. The PATHS system is fertile ground for the manifestation 
of systemising traits, and the task given to participants essentially requires them to build a collection of 
items. Given this, it is unsurprising that men were more likely to create more structurally complex paths. 
In the post-task interview, one male participant declared “I was organizing [the nodes] similarly to the way 
they appeared originally in the menu, so I was following that structure”. Another male participant said “I 
wanted to get to the end of [the path creation task] to show that I had understood it”. 

When asked why they added two pages of search results and two sets of thesaurus topics as nodes 
in a path, one participant replied “I was thinking, ‘Somebody else is going to use this and come across it, 
so if they are looking for Monet, they might get part way down the path and want related artists’. And 
instead of having to go down and bookmark every single one, it was easier to do the search”. Another 
participant added everything they could find on the chosen topic as a composite node because they felt the 
selection was limited, so they wanted to capture all of the available data. 


5 Conclusion 

This study has brought to light a number of important user characteristics that must be considered for 
future iterations of the PATHS system. However, further evaluations are still necessary. For example, will 
the observed differences persist with a larger sample size, and when participants use the system in a more 
naturalistic setting, such as an extended field trial? In addition, this study is based on data derived from a 
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task in which users generated their own paths. It has yet to be seen whether these results will generalise to 
situations where users follow paths created by others. 
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Abstract 

This study focuses on modeling people’s perception of places and how those perceptions are affected by 
cultural differences. Cultural background influences the way people feel and think about many objects. 
How people recall and remember information also varies when their cultural backgrounds differ. However, 
it is unclear how cultural background influences individual’s perceptions and information behaviors 
regarding a geographic area, such as a town or neighborhood. One way that individual’s cultural 
background may vary is with regard to its degree how the culture they are most associated deals with 
routine communication. Hall’s high/low-context model (1976) that cultures differ significantly with 
respect to how the messages involved in everyday communication are structured and interpreted, so this 
will be used to represent cultural background. Also, the ways people perceive urban area are categorized 
into landmarks and physical addresses. For this study, we will conduct an online survey and have subjects 


play a short online quiz game. 
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1 Introduction 


Newcomers to an area often face difficulties arising from social, cultural, and economical differences. These 
general issues are further complicated because they are unfamiliar with their new environment. Not only 
must they adjust to a new culture and society, but they must do so while learning about a new place. 
Various information tactics have been suggested to support immigrants’ information behaviors (Lingel, 
2011). The way people perceive a new area can be a key in helping them seek, use, and share information 
so that they can successfully adapt to society. Facilitating the process of becoming familiar with an area 
would be helpful for newcomers adapting to a society, but for these efforts to be success the information 
tactics must take into account how people understand unfamiliar places. 

In this work, we focus on modeling people’s perception of places and how those perceptions are 
affected by cultural differences. Cultural background influences the way people feel and think about many 
objects. How people recall and remember information also varies when their cultural backgrounds differ 
(Kim, 2013). However, it is unclear how cultural background influences individuals’ perceptions and 
information behaviors regarding a geographic area, such as a town or neighborhood. When immigrants or 
visitors from China and Germany come to Washington D.C., they normally investigate the area in different 
ways for a period of time to get familiar with the city. During this period, do they perceive the area 
differently because of their different cultural backgrounds? If so, how does cultural background affect 
whether people can benefit from information that they encounter during this time of adaptation? Answering 
these questions would inform our understanding of how individuals adapt to new urban areas, allowing city 
planners, software developers, and researchers to better design information resources and systems for 
newcomers. 

In order to conceptualize and measure individuals’ cultural background, Hall’s high- and low- 
context model (1976) is adopted. Also, the ways people perceive urban area are categorized into landmarks 
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and physical addresses. In the following sections, we first present the concepts of high- and low-context 
model and landmarks. Subsequently, we show the design of the experiment, limitations of our study, and 
our work to be done by the conference. 


2 Concepts 


In the computer-supported cooperative work (CSCW) literature, the term “space” means Euclidean 
structure comprised of shapes or colors. The concept of “place” includes not only three-dimensional structure, 
i.e., “space,” but also recognizable and persistent traits that provide cultural and social meanings (Dourish, 
2006). When we use the terms “space” and “place”, they follows the CSCW concepts. 

One way that individual’s cultural background may vary depends on the degree of routine 
communication that is dealt with in the culture he or she is mostly associated with. Hall’s high/low-context 
model (1976) that cultures differ significantly with respect to how the messages involved in everyday 
communication are structured and interpreted. High-context individuals usually assume that the social and 
physical context contains most of the relevant information, leading to very little of information to be 
included in the coded part of the message. Low-context individuals, on the other hand, tend to make fewer 
assumptions about the general availability of information, leading to messages in which more of the relevant 
information is explicitly present (Hall, 1976). Of course, there are many other cultural models such as 
Trompenaars’ seven dimensional model (1993) and Murdock’s universal cultural traits (1965). One 
especially noteworthy of mentioning is the Hofstede’s cultural model (Hofstede and Minkov, 2010). It defines 
five factors to explain cross-national databases based on surveys for IBM employees from 70 countries 
(Hofstede and Minkov, 2010). This model provides measurable metrics such as individualism/collectivism 
or power distance to rate people’s cultural features, but has limitations at the same time in the data where 
they can be biased due to the one-sided respondents, i.e. IBM employees. Hall’s classifications, in contrast, 
does not provide specific measures such as national scores or points but at the same time, it defines “general 
all-compassing” terms about individuals’ cultural characteristics without having biased criteria (Straub. et. 
al., 2002). This means, if we have reasonable measures or protocols that can determine each individual’s 
cultural features, Hall’s model can have more advantages in the sense of having less-biased data. 

An important aspect of how some individuals perceive an area is landmarks. Landmarks are 
representative objects that are perceived to be in an area (Sorrow, 1999). When people think of ‘urban 
places’, they may recall cognitive, structural, or visual aspect of the space. These features are organized as 
‘landmarks’ (Sorrow, 1999), so it would be meaningful to see how people perceive places between the 
landmark features and another form such as physical addresses. If respondents recognize the area as a ‘place’ 
or a ‘space’, they would perceive as one of the landmarks. In the case that they do not recognize it as space 
or place, they might remember it with addresses or text-oriented entities. Landmarks can be reasonable 
measures since they contain not only the concept of visual memory, but also the urban characteristics. 

While the implications of cultural differences for spatial cognition are unclear, prior work which 
has found systematic variation in information behaviors and processing implies that culture may affect how 
people perceive and work with information about new places. Schmitt et. al. (1994) studied different ways 
of memorizing across different language groups and found that Chinese-speakers tend to use visual memory 
more than English-speakers when they recall information. This suggests that they are likely to also be 
systematic varying among individuals from different cultures with respect to how they use (or don’t use) 
landmarks, addresses, and other forms of information for learning about, understanding, and describing new 
places. 
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3 Our Approach 


We use two approaches to examine the relationship between the cultural background and perceptions of a 
place: an online survey to assess individuals’ cultural context and a quantifiable web-based game to 
determine how they perceive places. These web-based methods have advantages with respect to sample size, 
sample heterogeneity, and cost-effectiveness compared to lab-oriented approaches (Reips, 2000). 

The survey allows us to distinguish high and low-context individuals. As suggested by Straub et. 
al. (2002), each individual has complex cultural features that cannot be strictly determined by simple 
demographic indicators such as nationality or gender. This leads us to develop protocols that can identify 
a subject’s cultural characteristics, specifically the degree to which they have tendencies consistent with 
high and low-context cultures. Existing multi-item measures for high- and low-context orientation 
(Herselman et. al., 2011) have been modified for this study. Pilot tests are being conducted to evaluate 
external and internal validity of the adapted measures, and the final version will be used in an online survey. 

Respondents who complete the online survey are then directed to the next step: a web-based game. 
In the game, participants are shown a series of photos from a target area’s streets and are asked to indicate 
where each one is located. In each case individuals will be given three types of options for answer the quiz: 
a list of street names such as North East 11th, a list of landmarks such as a building name, and ‘Don’t 
know’. The photos are drawn from Google Streetview. The basic mechanism of the game is based on 
‘UrbanOpticon’ which was developed for and proved to be quantifiable in prior work that examined the 
recognizability of London’s streets (Quercia, 2013). Main concerns in designing the game include 
quantifiability, measurement of respondents’ familiarity with the targeted area, and randomization of 
quizzes to minimize learning effects. This online research will be conducted for recruited people from 
different cultures living in and around District of Columbia. The specific location in the city to target will 
be determined after examining patterns of landmarks and streets so that both of the elements can co-exist 
in the area with making a good balance, i.e. places where one feature is dominant will be avoided such as 
the White house area in which famous landmarks rules. 

Likert-scale items used to assess high- and low- context orientation in the questionnaires so that 
will be used to construct a single measure instead of categorizing people into two distinct groups. For each 
survey, average score would be calculated and this score shows an individual’s cultural tendency between 
high- and low-context cultures. Each individual’s score from the survey would be plotted against the game 
results. Both accuracy and type of game answers will be considered. The proportion of answers given in 
terms of street names vs. landmarks will be used as an indicator of how individuals perceive and think 
about the area. Accuracy will be used to assess the subjects’ awareness of the area. These data will be 
analyzed with ANOVA to determine if there is a statistically significant relationship between cultural 
background and spatial cognition. 


4 Limitations 


As with any empirical study, this work has limitations which must be addressed. Landmarks, in general, 
and answer types, in specific, are only indicators of individuals’ perception of a place. Interviews with pilot 
participants will help validate that this operationalization is reasonable and appropriate. The public nature 
of the survey and games may result in frivolous respondents and a high dropout rate. In order to deal with 
this, subjects will be notified how long it takes, inserting a warm-up phase, and an explanation of the 
research will be provided (Reips, 2000). Targeted recruiting methods will also be considered so that the risk 
of spurious and insincere participation can be minimized. Since road conditions, traffic policies, and 
addresses vary among countries, results may be dependent on people’s home countries. This can be a critical 
disturbing factor in the study. Developing a measure to assess distinguishes high and low-context individuals 
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will help minimize the effect of this confound by allowing us to examine each individual’s tendency and 
nationality as separate factors. 


5 Conclusion 


This work answers a basic question on how people adapt to new urban areas. Since newcomers have to deal 
with not only different culture and society, but also learning new places, well-designed information tactics 
would be crucial to guide and help them. In order for the success of information tactics, it is one of critical 
knowledge bases to understand individuals’ perceptions and information behaviors regarding a geographical 
urban area. By clarifying how cultural background influences people’s perception regarding an urban area, 
this research will be able to provide a basis to consider to researchers, city planners, developers, and 
governors so that they can design better information tactics for newcomers. Even though it is not easy to 
measure cultural characteristics and human perceptions, the verification processes are designed in ways to 
quantify representative models and concepts while minimizing biased elements in those models. For the 
credibility of empirical study, it is planned to expand experiments gradually. First, we are planning to 
conduct a test and analyze data both for the survey and the web-based game by February 2014. In this 
test, we will target a small group of people, approximately 20, to check the validity of the experiments as 
well as to show the intermediate results. Based on the results, research protocols or interfaces would be 
refined to minimize errors and disturbing factors. Then, we will recruit more people to collect a reasonable 
amount of data for further experiments. 
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Abstract 

This exploratory study sets out to better understand the information practices of academic library staff 
members engaged in digitization projects. Using semi-structured interviews and workplace observation, 
the research considers digitization within the framework of practice theory. Analysis of the data suggests 
that the use of information sources and the role of embodied knowledge in digitization work depends on 
the relationship between the media formats being digitized and the level of development of standards 
and policies for digitizing those formats. The research also suggests a variety of new avenues of inquiry 
for further conceptualizing digitization as information practice in archives and libraries, including the 
relationship between aesthetics and situated judgment in the creation of digitized library materials, and 
in the interplay between standards-following and improvisation supported by self-documentation. 
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1 Introduction 

Digitization projects are currently being conducted on a massive scale, with projects like the Library of 
Congress’s American Memory project, the Google Books project, and the National Jukebox project (con- 
taining over 10,000 digitized 78 RPM recordings from years 1900 to 1925) offering expanded access to 
collections of various types of previously restricted materials via digital library systems. Scholarly access to 
preserved documents has been greatly enhanced through digitization, yet the sites where digitization take 
place have seen little in-depth qualitative analysis. This research project uses practice theory and qualitative 
methods to focus on one key site of digital production, a digitization lab at a research library (at a large 
university in the Northeastern United States). 


2 Related Research 


Past research on digitization practice has typically employed case studies of practical applications (e.g. 
Berger, 1999; Capell, 2010; Evens and Hauttekeete, 2011), or involved recommendations for standards de- 
velopment (e.g., Fleischhauer, 2010; Plichta and Kornbluh, 2002; Teper and Shaw, 2011). Lopatin’s (2006) 
survey of literature on digitization projects from 2000-2005 suggests that studies on digitization tend to 
focus on such pragmatic issues as project management, funding sources, selection of materials for digitiza- 
tion, legal issues, metadata creation, interoperability, and digital preservation. In their review of existing 
literature on digital libraries, Landis and Chandler (2007) note that much of it “focuses on technical issues 
and solutions, and is driven by a ‘proof of concept’ research approach” (p. 2). Few researcher have pursued 
more theoretical approaches to digitization practices. While a variety of researchers have considered other 
facets of digital library development, such as Dalbello’s (2005) social constructionist study of digital library 
administrators and institutional practices, and Rosenbaum and Joung’s (2004) use of the concept of socio- 
technical interaction networks as a model for understanding digital library enrollment, the key site of dig- 
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3 Research Questions 


e How do digitizers use information sources when digitizing materials for preservation? 
e How do digitizers enact embodied knowledge in their digitization practice? 
e How do digitizers enact standards at the local level through their practice? 


4 Method and Methodology 


4.1 Research Design 


This exploratory research focuses closely on three library staff members engaged in digitization activities. 
Participants selected had similar graduate-level educational backgrounds, and similar job responsibilities 
(project management and digitization of special collections). Through the process of observation and inter- 
viewing, data were collected on the participants’ digitization tasks, their use and creation of information 
resources, and their manipulation of a multi-faceted environment of physical and digital tools. 


4.2 Theoretical Framework 


The theoretical approach for this project is based on practice theory, which takes the basic unit of analysis 
as “sets of actions that are based on the interconnectedness of their various nonreducible elements” (Veinot, 
2007, p. 159). In addition, “practice theory contains a unique understanding of the body, which ‘highlight|s] 
embodied capacities such as know-how, skills, tacit understanding, and dispositions’ as the basis of activity 
(Schatzki, 2001)” (Veinot, 2007, p. 160). This focus on forms of embodied knowing helps practice theory to 
move information studies research away from cognitivist understandings of individual action to considering 
social actors as operating “within expectations or ‘the accountability of a shared way of doing’ (Corradi, et 
al., 2010, p. 277) set up in a practice” (Cox, 2012, p. 177). 

Cox (2012) suggests practice theory has seen relatively limited use in information behavior research 
outside of work in knowledge management (p. 186), and Moring and Lloyd (2013) suggest that practice 
theory is “still emerging in the information studies field” (para. 1). Practice theory has seen ongoing use in 
information literacy research, particularly in the work of Annemaree Lloyd (2009, 2010, 2011, 2012) and 
others (e.g., Sundin and Francke, 2009; Tuominen, Savolainen and Talja, 2005). Practice theory is seen as 
useful in this research because it enables a conceptualization of information literacy “as something that 
develops in social contexts and is specific to a particular community” (Talja and Lloyd, 2010, xii). Practice 
theory has also seen fruitful application to studying the information practices of blue-collar workers (Veinot, 
2007). Sundin and Francke (2009) define practice as the “various manifestations of repeated activities, 
including historical, social, cultural and material ones” (para. 6). The emphasis on the materiality of practice 
is significant for this research. Rather than focusing on processes of information seeking in the cognitivist 
framework, practice theory opens up the field of information-related phenomena to include the physical 
manipulation of tools and documents in virtual and physical spaces, as well as emphasizing the specificity 
of interfaces, textual genres, etc. In this project, practice theory allows us to study digitization as an em- 
bodied and situated activity (following Suchman, 1987), requiring improvisation within dynamic conditions 
and utilizing an array of tools and documentary techniques. 


4.3 Data Collection 


Participants were recruited with the help of the manager of the library digitization lab. Each participant 
had recently graduated with a Master of Library and Information Science degree from the same library 
science program, and had been working at the library as part-time employees for a year or less. Each 
participant was working independently on a different preservation project. Data were gathered from the 
participants via semi-structured interviews, running 25 minutes to an hour, using a set of nine interview 


questions, based on Veinot (2007). In addition, participants were observed for up to two hours, conducting 
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their everyday digitization tasks, and asked to describe each step of the workflow and give examples of 
problems that had developed in the past and what information sources they utilized to solve them. 


4.4 Data Analysis Procedures 


The audio from the interviews and observation sessions was recorded and transcribed by the researcher 
immediately after each session. Coding of emergent themes began in the transcription process. In addition, 
the audio, images and notes produced during the observation sessions were transcribed and integrated with 
the existing data, identifying participants’ uses of tools and their use and production of documentary forms. 
Initial open coding was conducted during the transcription process to begin to immediately identify emer- 
gent themes. After transcription and initial coding, the data were further analyzed to elaborate major 
themes using constant comparative methods (Corbin and Strauss, 2008). Additional analysis focused on 
identifying the specific tasks, tools, problematic situations, and information sources used within each par- 
ticipant’s workflow. These elements of the workflow were compared between participants to identify com- 
mon and divergent themes across each type of digitization project. 


5 Findings/Analysis 


5.1 Institutional Context of Digitization Work 


Participants in the digitization lab are responsible for handling a variety of tasks, including retrieving library 
materials, digitizing materials, transcoding between digital formats, conducting quality control on resulting 
digital copies, creating metadata, and ingesting files into the digital repository for access and preservation. 
The ongoing digitization work in the lab is typically structured and monitored by workflow software, but 
breakdowns in the system and the needs of specialized, ad hoc projects require staff members to improvise 
documentation techniques (e.g., forms, policy documents, spreadsheets, logbooks, technical specifications 
for equipment settings) for keeping track of their current position in the project and noting any modifica- 
tions they may make to established procedures. These documentation techniques help participants manage 
their work, but they also act as a means of consciously transmitting their knowledge to future workers who 
may hold their position in the future. 


5.2 RQ 1: How do digitizers use information sources when digitizing materials for preservation? 


Digitizers use standardized imaging targets and default settings to implement standardized baselines for 
digital scanning, and they are instructed by senior administrators not to manipulate tonal or chromatic 
relationships once documents have entered the digital domain. However, a digitizer must rely on her own 
aesthetic judgment (discussed below) for making decisions about the quality of digital output. Participants 
overcome problematic situations by searching for illustrative models of how others in the wider library 
community have overcome similar obstacles, following the online guidelines of such organizations as the 
Online Computer Library Center for metadata, as well as looking back to earlier records in the library’s 
catalog as guides. 


5.2.1 Aesthetic judgment 


Participants displayed a form of embodied knowing that I have tentatively labeled aesthetic judgment, in 
which individuals integrate educated perceptual abilities and situated knowledge to come to a decision 
about the acceptable visual quality of digitized copies. The use of targets and color swatches takes away 
some of the need for aesthetic judgment, replacing it with objectively established measures, but other 
attributes such as depth of field, angle of lighting, etc. lack established standards or guidelines, so aesthetic 
judgment is required. For instance, Participant B explained the limitations of this process of making aes- 
thetic judgments: “Sometimes this is as good it gets. I hate to say it. Some of the blurring on the items 
seeks to assist with that three dimensional feeling.” 
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One unexpected finding was the creative use of color in various places in the workflow, which 
further suggests that participants are engaging in a practice that has a significant, and unexplored, aesthetic 
dimension. For instance, two of the participants use improvised systems of color coding as an important 


strategy for managing their file systems. 


5.3. RQ2: How do digitizers utilize embodied knowledge in the practice of digitization? 


Participants commonly employed embodied knowledge for a variety of tasks, including assessing quality 
output, determining when files were done being processed, setting up items for digitization, and creating 
metadata records. These embodied practices draw on visual, aural and tactile ways of knowing to implement 
tacit knowledge gained through training and experimentation with equipment. For instance, in digitizing 
three dimensional artifacts, participant B employs methods of weighing and measuring with instruments 
and her hands to gather information in order to fill metadata fields in the digitized item’s record, as well 
as to assess how difficult it will be to position a particular object for digitization. 


5.3.1 Improvised self-documentation 


An emergent theme I have termed improvised self-documentation, in which participants create their own 
forms and documents to record the improvised actions and decisions that they enact in their practice, 
produces important information sources utilized by participants and others in the lab. New knowledge about 
innovative digitization processes is preserved through self-documentation, helping digitizers to manage per- 
sonal workflow organization, transmit new knowledge to other digitizers, and to communicate progress to 
supervisors. In addition, the materials produced through self-documentation, such as application profiles, 
user guides, and logs of digitized materials appear to serve as de facto local standards, working to ensure 


continuity in practice over time. 


5.4 RQ3: How are standards enacted in preservation practice? 

Participants self-consciously monitor their personal actions, and the actions of others working with them 
(i.e. workstudy students) for correspondence to accepted local standards. Participants are recent graduates 
from the same Master in Library and Information Science program, suggesting that their common self- 
conscious adherence to library norms may be tied to as shared educational background. 


Once activities become ‘ 


‘rote”, internalized, and fully embodied, referring to standards and infor- 
mation sources typically becomes less frequent, until a problematic situation develops. For simple text 
documents, standards are strict, and all settings have been stored as defaults or presets in the hardware 
and software of the system. For other materials, such as three-dimensional artifacts, videos, audio record- 
ings, or maps, standards for digitization and metadata creation are seeing ongoing development. The par- 
ticipants indicated a tension between the standards that they have agreed to follow and what they some- 
times would like to do to improve the quality of the digital output. For instance, Participant B admits 
that “sometimes I ever so slightly fudge it in Photoshop, but we’re really not supposed to do that. We are 
not supposed to beautify these [digitized objects]. We [only] remove distracting background elements.” 
Beyond effacing marks of the context of image scanning (removing the background elements from the final 
image), after digitization, modification to the image is avoided. At other times, standards are followed 
without question, or are directly embedded within the technology in the form of defaults or presets that 
cannot be circumvented without technical intervention from administrators. 


6 Summary 

These findings suggest that digitization is an information practice that operates under varying conditions 
of standardization, improvisation, self-documentation and embodied knowledge. The lab’s workflow soft- 
ware precisely defines and tracks each step in the process of scanning text-based documents, but other 
formats require varying degrees of deviation from the rule-based workflow model. Yet, even in the highly 
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rigid structure of the standards for digitizing text, participants still rely on embodied knowledge and tacit 
understanding to make judgments about whether items were scanned properly or not, often times impro- 
vising creative uses of color-coding in the file system to facilitate quick access to current files. 


6.1 Limitations 


The major limitation of this project is the small sample size of three participants. Additional participants 
and institutional sites will certainly help to enhance the validity and scope of these findings. 


6.2 Future Research 


This work is exploratory, and it is hoped that the patterns in digitization practice that emerged from the 
data can be further explored in a broader study. This research opens up the following avenues for future 
research: 
e Exploring aesthetic judgment and the role of color in digitization practice and in human information 
behavior, more generally speaking. 
e Further examining the role of texts produced through self-documentation in the social construction 
of digitization practices. 
e Explore how the interaction between standards and self-documentation interact to socially legiti- 
mize (Cf. Byström & Lloyd, 2012) digitization practice. 


7 ~ Conclusion 


While the use of practice theory in information research is “still emerging” (Moring and Lloyd, 2013), this 
research indicates the applicability of a practice theory approach to conducting research on information- 
related activities in the digitization labs of academic libraries. This research offers future directions for 
research into expanding theoretical understanding of digitization as a complex information practice that 
draws on multiple modalities of knowing, including the emergent theme of aesthetic judgment, which has 
not seen exploration in the field of Library and Information Science. This research also contributes to the 
conceptualization of the role of standards in local practice. 

In addition, better understanding of digitization practices in academic libraries also points to prac- 
tical implications, such as providing insight into how systems developers might create workflow software 
that can better assist staff members’ changing documentation requirements when digitizing different media 
formats. 

In terms of enhancing theoretical understanding of digitization, this exploratory research suggests 
that practice theory offers a rich qualitative lens through which to conceptualize digitization as a dynamic 
and embodied practice, instead of as a simple, specifiable task. 
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Abstract 

In the United States, primary and secondary (i.e., K-12) schools are rapidly changing their technology 
infrastructures to comply with sweeping federal mandates for enhanced data utilization and digital 
learning. These changes provide information science researchers with a unique opportunity to apply 
informatics constructs to the study of K-12 organizations. In honor of the 2014 iConference theme, this 
poster breaks down disciplinary walls in a map of K-12 education informatics elements in which the 
researcher 1) took inventory of the research- derived knowledge of technology use in the K-12 
environment; 2) reviewed major empirical approaches to technology infusion in K-12; and 3) proposed a 
model for K-12 education informatics that may be useful for future research and professional learning. 
This depiction of information, information systems, and information technology research (i.e., core 
elements of informatics) is intended to foster empirical model development, inspire future research, and 


provide considerations for professional learning. 
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1 Introduction 


The goal of the United States’ National Education Technology Plan’s goal is to “enable engaging individual 
learners’ personal interests by connecting web learning resources to learning standards, providing options 
for adjusting the challenge level of learning tasks to avoid boredom or frustration, and bridging informal 
and formal learning in and outside of school” (Office of Educational Technology, 2010, p.17). While the 
Plan’s objectives are timely and important, less clear is how to bring these myriad forces together and how 
to use this confluence as a springboard for research and knowledge building. The poster uses a conceptual 
exploration of K-12 education informatics to illustrate and begin to define and connect its components to 
the values and work of the iSchool community. 


2 Defining and Exploring K-12 Education Informatics 


Benyon-Davies (Benyon-Davies, 2007) defined informatics as “a convenient umbrella term to stand for the 
overlapping disciplinary areas of information systems, information management and information 
technology” (p.306) and later refined these core components to “information, information systems and 
information technology” (Benyon-Davies, 2009, p.92). With these ideas in mind, this poster will depart from 
Beyon-Davies’ conclusion that informatics serves to “support of coherent decision-making and action,” to 
examine that notion in the context of K-12 (i.e., primary and secondary schooling) educational 
organizations. 

The term “education informatics” is used to describe different aspects of information technology as 
applied to, practiced in, or recommended for future implementation in education, teaching, and learning. 
The origin of the need for this sub-discipline likely was one of two events: 


1. In 2005, librarians associated with the top 50 schools of education in the United States convened 
with the goal to discuss ways to provide better access to education information. They concluded 
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that the discipline of education lacked, but needed, a formalized informatics program, similar to 
that found in fields such as health, to focus on the use of technology to solve education information 
problems. This group advocated for the field to be interdisciplinary with participants from 
information science, education, computer science, and other fields (Carr, Collins, O’Brien, Weiner, 
& Wright, 2010; Collins & Weiner, 2010) and for its researchers and practitioners to understand 
how federal, state, and local policy drives the interplay between technology and education (Carr & 
O'Brien, 2010); 

2. A U.K. research team led Nigel Ford called for researchers to move beyond the concept of technology 
integration, a notion that inherently assumes technology as external to the processes it enables, to 
education informatics (Ford, 2004), a “[s]tudy of the development and application of digital 
technologies in relation to the analysis, storage, manipulation, retrieval and use of information 
selected from multiple independent information sources, in relation to learning” (Ford, 2005, p.362). 
According to Ford (2008, p.ix), education informatics is the study of “the development, use, and 
evaluation of digital systems that use pedagogical knowledge to engage in or facilitate resource 
discovery in order to support learning.” While education informatics at the tertiary education level 
is a rapidly growing area for information science, none of the aforementioned originators have 
comfortably placed the learners into working definitions or 


“Closed-Loop Learning” 
determined how their definitions create an agenda for the study or 


of complex K-12 environments. “Personalized Learning” 


Given their extensive ranges of influences, structures, and functions, K- 
12 schools are considered highly complex organizations (Etzioni, 1975). 


When technology is considered in the context of educational 
Standards-Based 
Learning Objects 


organizations, its value emerges in its relationship to organizational 


activity mediated through information systems. 


Standards-Based 
Assessment Items 


2 Current Significance of Education Informatics 


Recent forces have dramatically changed the internal structure and 
function of information and technology K-12 organizations in the United 


. r . Instructi: Analytic: 
States, thus creating a unique impetus for developing a research agenda Ere E 


Assessment Content 


for education informatics: the Department of Education’s Race to the 

Top (RT3) funding; and the Common Standards Movement that 

includes the Common Core State Standards (CCSS), the Next Figure 1: Closed loop learning 
Generation Science Standards (NGSS), and the college and career componenug 

readiness standards movements (Evans, 2012). 

RT3 applications require state and local education agencies to 
establish instructional improvement systems (IIS) through which student data, teacher profiles, learning 
resources, and assessment results are integrated to generate rapid, personalized feedback that allows teachers 
to individualize and differentiate instruction (Saldivar, 2012). These IIS data points create a closed loop 
among teaching and learning resources, instruction, and assessment that allows teachers to personalize 
learning for each student (Manderson, 2013). 

Fundamental to this process are a repository of vetted common standards-linked learning and 
assessment resources upon which to base instruction (U.S. Department of Education, 2013) and IIS data, 
instructional technology, virtual learning platform, digital textbook, and other learning systems 
interoperability. Unfortunately, there is little research on the extent to which K-12 schools will be able to 
incorporate data and tools to affect real change and realize common standards (Evans, 2012). 
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3 Mapping K-12 Education Informatics 


Concept maps are highly effective graphical tools for representing new knowledge through the depiction on 
two dimensional node links that visually illustrate the relationships between concepts (Novak & Cañas, 
2008). To create a concept map, a researcher begins with a focus question, which in this poster was “What 
are the components of K-12 education informatics.” This question guides data collection; illustration and 
linking; and model proposition. 

To compile data for the poster, I explored the interrelationships between information, information 
systems, and information technology by taking inventory of the research-derived knowledge and major 
empirical concepts or findings relating to technology infusion in educational organizations. Sources included 
sociocultural, sociotechnical, and technical research articles, legislation, and reports published between 1993 
and 2013. I reviewed the sources and identified the major findings or themes in each source. Then, the 
themes were sorted into three broad concept categories: information, information systems, and information 
technology. I then mapped each concept category and created links within and across categories. 

Using the CMap Tools software from the Florida Institute for Human and Machine Cognition!, I 
created and linked circles for each concept. Linking words or linking phrases were placed on a link line to 
specify the relationship between the concepts. The completed concept map is a proposed model for K-12 
education informatics useful for future research and professional learning. 


4 Conclusion 


To recognize the value and impact of technology to the educational system and to the learning process, 
leaders of educational organizations must ensure that their infrastructure accommodates the information 
needs of participants; education informatics may provide a framework for discernment. Many seminal 
informatics studies will also be interesting to replicate in a K-12 context. A greater understanding of K-12 
education informatics allows researchers to pursue rich research questions such as: 


e Who are the actors in a K-12 information environment? How do their contributions interrelate to 
produce knowledge? 

e What does a K-12 information ecosystem look like? What situational factors influence the success 
of various components and how can this success be maximized? 

e To what extent and in what ways is personalized learning in K-12 dependent upon affordances such 
as technology and bandwidth access, professional skill, organizational policies, community norms, 
and personal motivation? How do these affordances interrelate? 


An exploration of the concepts undergirding K-12 education informatics can have profound implications for 
the initial and continuing education of information professionals in the iSchools. The obvious place in which 
K-12 education informatics fits is in a reassessment of the traditional school librarian preparation 
curriculum. All too often, school librarian programs are relegated to less highly regarded roles in many 
iSchools, despite the fact that most iSchools have them (Mardis, 2009). Enrollments in these programs are 
declining (Wallace & Naidoo, 2010) as professional school librarians’ positions are increasingly slated for 
elimination (Ellerson, 2009, 2010, 2012). With an improved understanding of the many disciplines to which 
K-12 education informatics are connected, iSchools faculty may be able to see ways in which to not only 
evolve their school librarian preparation programs to programs that train education informaticists, but also 
transform them into programs that fit naturally into the iSchools research agenda as complements to 
coursework in social, health, community, and other areas of informatics education. 

Understanding the relationship between who we teach (learners), what we teach 
(curriculum/instruction), what we teach with (instructional resources), how well students are learning 


1 http://cmap.ihmce.us/ 
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(assessment), and who is teaching (staff) is essential if our schools are to benefit from technology. Federal 
policymakers’ push for better learning and teaching through HS has been forward thinking but is largely 
uninformed by the research heritage of informatics. By using the literature in areas such as educational 
technology integration, personalized learning, resource curation, and educational applications of broadband, 
the goal of this work is to move beyond the idea of technology integration to a framework from which 
scholarly research and professional learning can respond to this urgent and timely issue. 
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Abstract 

This poster will introduce a key educational collaborative project in libraries and museums between the 
U.S. MacArthur Foundation, the Institute for Museum and Library Services, the Urban Libraries Council, 
the Association of Science-Technology Centers, the Chicago Public Library, and the Digital Youth 
Network. Using this project, the authors will share how research on how the use of technology by digital 
youth has influenced practice in this collaborative venture; demonstrate how the basic research has led 
to a broad-based understanding of connected learning that breaks down barriers for youth in a networked 
world; validate the presence of information-provision persons in library learning labs; and explore what 
this means from a university perspective for the education of 21 century librarians working with youth. 
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1 A Key Educational Collaborative 


It all started as part of the MacArthur Foundation’s decision, as part of their Digital Media and Learning 
Initiative, to try an experimental learning space for teens called YOUmedia. Opened in 2009, YOUmedia 
occupies an open, 5,500-square-foot space on the ground floor of the Chicago Public Library's downtown 
Harold Washington Library Center. A dedicated member of the Chicago Public Library staff oversaw the 
project design and content. From there it spread to four additional Chicago Public Library spaces and then 
through a grant process to 24 additional public libraries and museums across the country. This expansion 
was made possible through funding from a partnership between the MacArthur Foundation and the Institute 
for Museum and Library Services, founded upon the success of YOUmedia. The Urban Libraries League 
and the Association of Science-Technology Centers joined the collaboration to support the design and 
implementation of the additional learning labs and to provide a community of practice, technical support, 
and cross-project collaboration. A prime purpose of these learning labs was to turn teens and tweens from 
consumers of media to creators — in essence to put youth into the driver’s seat. Nonetheless, the adults in 
the learning labs play an essential role as mentors for the youth; mentors came from library staff and the 
Digital Youth Network (DYN), another partner in the venture. The learning labs incorporate both 
traditional and digital technologies and serve as a prime example of the iSchool purpose of connecting 
people, information, and technologies. 


2 How Research Influenced Practice 


With a grant from the MacArthur Foundation, Professor Mimi Ito, a cultural anthropologist, and a cadre 
of other researchers set out to conduct a multi-year study of teens use of technology in both formal and 
informal spaces with an emphasis on interest-drive use in information spaces. In 2009, the MIT press 
published the findings of their research in a book, also available free online as a .pdf, called Hanging Out, 
Messing Around, Geeking Out (Ito et al.), to represent the different levels of participation with digital 
media. The research was revolutionary in that it demonstrated the same young person actually used various 
technologies at different levels rather than adhering to just one use as much previous research indicated. 


iConference 2014 Eliza T. Dresang et al. 


In the hanging out stage, teens tend to spend more time talking with one another than engaging with the 
technology, although they may be hanging out with friends on Facebook or other digital media. In the 
messing around stage their interest in technologies increases but in more of an exploratory or casual manner, 
e.g., exploring games, than with a systematic or identified purpose. In the geeking out stage, however, 
teens are much more purposeful and engaged in the use of technologies and their use in the creative process. 
The Learning Labs often become maker-spaces. One of the questions to be answered in the YOUmedia 
learning lab was what made the difference for teens who were just hanging out and those who were geeking 
out. 


Figure 2: Young Engaging in YOUmedia Learning Lab 


At any rate, the design of YOUmedia encourages all three of these uses for teen participants. According to 
Ito, libraries are transformed, as barriers come down, into loud, exciting learning spaces. This collaborative 
is one of the largest and one of the most unique ventures to be based directly on academic research, 
demonstrating the importance of research and of applying research to practice. YOUmedia is about 
reimaging learning in the 21" century. 
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3 Connected Learning 

Ito’s research, supported by the MacArthur Foundation, expanded from the original 3 principles to a more 
holistic look at how 21* century youth are learning from digital media. The term ‘connected learning’ has 
been applied to this overall view, and specific principles underlying it have been identified (Ito et al, 2013). 


Figure 3: Mizuko Ito 


The purpose of recognizing connected learning is to provide teen users of online networks with the 
opportunity to learn with the support of peers and caring adults in ways not otherwise possible or available. 
“This model is based on evidence that the most resilient, adaptive, and effective learning involves individual 
interest as well as social support to overcome adversity and provide recognition” (Ito et al, 2013, 4). The 
topic of digital badges has been incorporated into the implementation of connected and shared learning 
with research currently underway to determine the value of these seemingly extrinsic motivators. Mozilla 
and HASTAC as well as the Gates Foundation support this collaborative effort. One organization that has 
adopted the idea of rewarding librarians with badges for reaching identified competencies is the Young 
Adult Library Services Association, a division of the American Library Association. 


4 The Value of Information Professionals in Learning Spaces 


There have been two formal annual evaluations of the YOUmedia experiment conducted by the University 
of Chicago Consortium on Chicago School Research (CCSR). One of the questions to be answered in the 
YOUmedia learning lab was what made the difference for teens who were just hanging out and those who 
were geeking out. (Austin et al. 2011; Seebring, et al., 2013) From teens came the answer — the mentors. 

Supportive adults who built upon teen interests but provided visions into possibilities were key to 
the success of the learning labs. YOUmedia staff found that teens left on their own did not automatically 
connect with workshops and other structured activities that were designed to teach new skills and provide 
opportunities to explore interests more deeply. That changed when adults reached out to connect with 
youth socially, acting as guides and “cool” collaborators (Austin, 2). 


5 What This Means for iSchools Educating 21* Century Librarians 


Paying attention to the transformation that is taking place in libraries across the U.S. and other countries, 
e.g., The Edge in Queensland, Australia essential for iSchool educators. It means a two-way street — 
research into practice and practice back into research and teaching. Traditional ‘library school’ courses do 
a disservice to aspiring information professionals unless they acknowledge the Breaking down of Barriers to 
learning that have transformed the spaces and roles of libraries. At least one university in the U.S. has 
taken on the role of emphasizing Digital Youth in terms of hiring faculty and developing new courses and 
curriculum. Moreover, faculty from this university are spearheading efforts to develop a community of 
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researchers from a wide variety of disciplines in a ‘connected learning’ collaborative network. Others have 
the same purposeful agenda, in order to provide the proper educational experiences for their students 


6 Barriers Broken Down 


Numerous barriers have come down as this innovative project of transforming libraries into learning labs. 
The barriers between and among organizations and associations have come down as the value of 
collaborative relationships, such as YOUmedia, has become apparent. One of the most significant ones is 
the barrier between adults as custodians of information and youth as passive seekers. Adult and youth are 
partners in a productive learning environment; for some libraries this means a change in fundamental culture 
as well as context. And it means putting youth in the forefront of decision-making. More attention has been 
given to the learning spaces in libraries and they have been designed for more flexibility and appeal. The 
recognition of the validity of building upon youth interests rather than directing them and the need for 
equitable access to information have come to the forefront. The value of both technologies and people has 
come to the forefront in an age of questioning the role of librarians. Ito’s research that looks deeply into the 
context of the learning that takes place with digital media and in broadly into the connected learning 


situations is key in understanding the widespread implications of this transformative educational program. 
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Abstract 

In today’s Information Multiverse there are pressing societal, ethical and educational, as well as 
intellectual reasons why, rather than simplifying and reducing or streamlining, we should actually be 
complexifying and increasing our efforts to generate metadata that identifies, collocates, contextualizes, 
authenticates and enfranchises. Such metadata should not only draw upon, but should also 
simultaneously incorporate re-thinking of fundamental and long-established bibliographic and archival 
principles in light of the plural and increasingly post-physical nature of the Information Multiverse. Our 
ongoing research is modeling an Information Multiverse approach to metadata by identifying ways in 
which these complexified principles can be embedded in local, community and global (i.e., web) metadata 
infrastructures. Their underlying references to common concepts additionally open up the possibility of 
interoperability and re-use, and, outside the silos of professional/information fields, linking and 


navigation. 
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1 Introduction 


Canadian archival scholar Terry Cook suggests that over the past 150 years, the archival identity has been 
shifting across four different frameworks or mindsets that reflect the increasing complexity of the archival 
field’s societal role: juridical legacy, cultural memory, societal engagement, and community archiving. He 
sees these as being cumulative rather than completely replacing each other. As a result, he argues, the 
archivist “has been transformed . . . from passive curator to active appraiser to societal mediator to 
community facilitator. The focus of archival thinking has moved from evidence to memory to identity and 
community, as the broader intellectual currents have changed from pre-modern to modern to postmodern 
to contemporary” (Cook, 2012). In a related vein, the Archival Multiverse has been proposed by an 
international group of scholars in archival studies as an overarching framework within which twenty-first 
century archival practice and scholarship should be situated. The Archival Multiverse is simultaneously 
locally and globally oriented and encompasses “the plurality of evidentiary texts (records in multiple forms 
and cultural contexts), memory-keeping practices and institutions, bureaucratic and personal motivations, 
community perspectives and needs, and cultural and legal constructs” (Pluralizing the Archival Curriculum 
Group, 2011). 

Although these two recent and influential statements have emanated out of the archival field, we 
can extrapolate from them several key arguments that need to be more consciously factored into broader 
discussions about the nature and future development of information infrastructure in the digital world, as 
well as into the scope and responsibilities of the information professions. When we consider the plurality of 
all informational texts, practices and institutions, organizational and human motivations, community 
perspectives and needs, and cultural and legal constructs, we can also extend the concept of the Archival 
Multiverse to one of an Information Multiverse. The key arguments might be expressed as follows: 
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e Distinct societal and professional roles have emerged for particular information fields as a result of 
diverse and dynamic needs, cultures, technological developments, and political and intellectual 
trends. Some aspects of these roles continue to remain essential while important new aspects 
continually emerge that must be conceptualized and engaged. 

e Power differentials and inequities are at work in every aspect and context of information creation, 
organization, access and use. Developers of information infrastructures must more directly 
acknowledge and address the negative consequences of such power dimensions, and actively support 
emancipatory information structures and practices while balancing competing considerations at and 
between local, community and global levels. 

e The need for rich context becomes increasingly important as the volume of content online multiplies, 
and as that content is read both “along” and “against” the grain by diverse users and for a broad 
range of purposes. 

e Pluralism and complexity are defining, and arguably the richest and most emancipatory, 
characteristics of the Information Multiverse. Professional information practices and infrastructures, 
however, tend to encourage homogenization, assimilation and decomplexification. 


This poster reports on new conceptual work by the authors that is applying these arguments to metadata 
infrastructure, practices and use, and situating them within the framework of the Information Multiverse. 

We argue that in today’s Information Multiverse there are pressing societal, ethical and educational, 
as well as intellectual reasons why, rather than simplifying and reducing or streamlining, these professions 
should actually be complexifying and increasing their efforts to generate metadata that identifies, collocates, 
contextualizes, authenticates and enfranchises. Such metadata should not only draw upon, but should also 
simultaneously incorporate re-thinking of fundamental and long-established bibliographic and archival 
principles in light of the plural and increasingly post-physical nature of the Information Multiverse. The 
fundamental purposes of bibliographic information organization are to find, collocate and educate. In 
archival information organization, the primacy of provenance and of hierarchical and collective archival 
description serve to contextualize and authenticate, employing differing levels of detail as judged appropriate 
or necessary. Taken together but re-articulated for a twenty-first century networked and globalized 
information context, they represent a powerful, but underexploited and under-complexified amalgam of 
approaches for generating niche metadata capable of strategically addressing the profound pluralism, power 
differentials and persistent inequities of the post-physical Information Multiverse, while also exploiting the 
high-level linking power of the Semantic Web. 

Our ongoing research is modeling an Information Multiverse approach to metadata by identifying 
ways in which these complexified principles can be embedded in local, community and global (i.e., web) 
metadata infrastructures. Their underlying references to common concepts additionally open up the 
possibility of interoperability and re-use, and, outside the silos of professional/information fields, linking 
and navigation. We posit, therefore, that: 


1. The existing and potential scope of each of these principles should be identified and articulated 
within a twenty-first metadata context as well as in response to critiques that have emerged from 
or regarding underempowered or disenfranchised communities. These critiques assert that 
bibliographic and archival descriptive standards express and propagate dominant cultures and 
values, and privilege prevailing concepts of users’ needs (Knowlton, 2005; Russell, 2006; Duff and 
Harris, 2007; Flinn et al., 2009). An example would be the various recent proposals to expand the 
principle of provenance. In part due to the networked creation of bureaucratic records and in part 
because of a movement to include and acknowledge more voices within the archive, the concept of 
provenance, often presented as a monolithic principle, has been deconstructed and problematized 
to acknowledge the complexities of authorship that were always present in records as well as those 
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resulting from the kinds of relationships and collaborations made possible by networking and 
encouraged by globalization. Both functional provenance and multi-provenance have been applied 
to address practically how digital records are being created in and across organizations and research 
endeavors, while co-creatorship and the closely related constructs of multiple simultaneous and 
parallel provenance have arisen out of archival theorizing (particularly in records continuum and 
postcolonial and gender studies research) about the genesis of records and how this should best be 
described (Hurley, 2005a and 2005b). These propositions argue that traditional notions of 
provenance are oversimplified. With their emphasis on a single creating entity, those notions fail to 
acknowledge that multiple parties with different types of relationships to each other can be involved 
in the genesis of records (Gilliland, 2012). The propositions maintain, for example, that subjects as 
well as creators of records should be acknowledged as participants in that genesis and that archivists 
have an ethical imperative to pursue descriptive mechanisms for representing both creator and co- 
creator worldviews. 

2. Information professionals should make strategic decisions about when or under what circumstances 
there might be a compelling need to create rich or even alternate descriptions to address specific 
identified needs of particular underempowered or niche communities, e.g., through the use of 
pluralized access points, complex authority files that address co-creator roles, and bilingual 
descriptions; and when a higher-level approach, potentially supported by linked data might suffice. 

3. In devising standards, best practices, regulations, and terminology for international implementation, 
developers need to focus less on getting everyone to do things the same way, and more on how to 
inter-relate diverse community practices and ontologies. 


2 Conclusion 

By identifying ways in which these complexified principles can be embedded in local, community and global 
(i.e., web) metadata infrastructures, their underlying reference to common concepts opens the possibility of 
interoperability and re-use, and, outside the silos of professional/information fields, linking and navigation. 
An Information Multiverse approach to metadata not only breaks down walls between information fields, 
it also provides a way to address power differentials and inequities across cultures and communities, and 
supports enhanced technological access provision. Moreover it advances the conceptualization of the 
semantic web by arguing that the legacy metadata based on these established principles and practices 
should be carried further into the new technological environment of linked open data, lossless as to the 
information provided. Strategies and methods for doing so, as well as enabling “dumbing-up” to simpler or 
less granular statements such as dublincore.org or schema.org in order to interoperate data content have 
already been researched and some solutions proposed (Dunsire, 2012; Dunsire et al., 2012). Moreover, 
research in and publishing of bibliographic standards and conceptual models in RDF (Resource Description 
Framework) has shown that the present ways of representing them for humans to read should be 
transformed in a way also for machines to process and infer the meaning of metadata that result from 
applying them (Willer and Dunsire, 2013). This process opens up the platform to rebuild the prevailing 
bibliographic and archival principles and concepts with the aim of meeting the plural and increasingly post- 
physical nature of the Information Multiverse. 
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New forms of citizen participation in disaster response continue to emerge as disasters create the need, 
and technology creates the opportunity. The rapidly evolving connectivity and technologically mediated 
environment promise to expand the role of citizens not only participating through organized efforts, but 
also self-organizing group efforts. Using a case study method, this research aims to describe and explain 
the dynamic processes of how ordinary people come into being as a group for humanitarian efforts and 
maintain evolving processes of collaborative activities over time. I am exploring these issues in the context 
of a small group that emerged online in response to the 2011 Japan earthquake and tsunami. This poster 
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1 Introduction 


The greatest challenge following a disaster is to provide the right aid to the right people at the right place 
and the right time. Recent studies reveal that the technologically mediated landscape of disaster response 
is creating the new avenue where people converge digitally responding to the needs of disaster-affected 
communities directly regardless of where they are located (Hughes and Palen, 2009; Starbird, 2011). In the 
last decade, globally distributed digital volunteers who bring their expertise in computing have 
demonstrated their efforts in building the groundwork for this matter. Data standards for aggregation of 
registry information online such as “PeopleFinder” from the 2001 September 11, “Sahana,” disaster 
management systems from the 2004 Indian Ocean Earthquake and Tsunami, and interactive information 
mapping platform “Ushahidi” from the 2007 Kenyan crisis exemplify the recent phenomenon of volunteer- 
based, immediate citizen response operations (Wall & Chery, 2010; van de Walle & Turoff, 2007). The 
majority of research looking at this new trend of volunteer efforts has identified partial involvement of 
ordinary citizens those who have rudimentary skills and basic knowledge in computer uses and those who 
are capable of appropriating their familiar technologies and adapting new ways of using those technologies. 
It suggests that ordinary citizens are newly enabled to offer informational assistance through the organized 
efforts, purposefully crafted, organized with participatory culture, and designed in crowdsourcing modules 
(Denning, 2006; Palen & Liu, 2007). 

Most cases in digital volunteer work expose the role for ordinary citizens engaging given tasks 
distributed by dedicated, skilled individuals who lead these efforts. Also much of the relevant research 
question the problematic aspects of technical capabilities as a large-scale organized effort or demonstrates 
the advantages of new technologies such as the two-edged effects of unfiltered information: serving the great 
sources for relevant, comprehensible, and actionable information and creating misinformation and 
disinformation (Zook et al., 2010; Fraustino et al., 2012). In this study, I have taken a different tack. I focus 
on ordinary citizens self-organizing disaster relief efforts bringing their everyday knowledge and practices 
of ICT rather than creating something technical. Instead of studying how to eliminate informational 
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problems or characterize advantages in computing mass scale information that the public provide and 
produce, my ongoing research attempts to describe and explain the dynamics of ICT uses that evolve over 
time when ordinary citizens come into being as a group for humanitarian efforts. More specifically, this 
research focuses on the people outside the disaster-affected sites. This research addresses the following 
questions: what kind of activities are involved beyond information processing, what efforts are coordinated, 
and how does self-organized collaboration take place for the sake of disaster aid assistance. I am exploring 
these issues in the context of a small group that emerged online in response to the 2011 Tohoku earthquake 
and tsunami of Japan. This poster presents a case study of the work-in-progress. 


2 Background 


On March 11, 2011, the 9.0 magnitude of the earthquake centered in Northeast Japan caused the tsunami 
ranged from 10 to 100 feet high, together with the subsequent nuclear meltdowns at Fukushima Daiichi 
Plant, approximately 155 miles northeast of Tokyo. In excess of 27,000 persons in Japan were killed or 
missing, and more than 400,000 homes and other buildings have been totally or partially damaged. Physical 
damage of $309 billion, being nearly four times as much as Hurricane Katrina at amount of $81 billion 
makes it the most expensive disaster in history (Nanto et al, 2011). This “mega-disaster” (the 3.11 disaster) 
that hit one of the wealthiest countries reveals new dimensions to understand not only international but 
also ICT-mediated response systems to disasters (Guha-Sapir et al., 2012). 

As with hurricane Katrina, Haiti earthquake, and other disasters, the 3.11 disaster received 
outpouring of support from the world in various ways (Shklovski et al., 2010; Zhou and Lee, 2013). Through 
Twitter, Facebook, Flickr, and Youtube, for example, compassionate individuals around the glove shared 
what they were doing for the suffering others. Rumanian librarians and their patrons uploaded a video of 
cheering choirs. A text message sent from a mother who was evacuated from the tsunami together with her 
students made its way to her son in the United Kingdom who tweeted a rescue call on behalf of his mother, 
and as a result, his tweet captured the rescue helicopter who eventually saved all the evacuees including his 
mother and her students (Okamura, 2011). The manifestation of these global responses indicates that first 
responders are not only just victims, unharmed victims, professional rescue officers and agents and 
neighboring residents, but also members of the public in global scale (Murthy, 2011; Palen et al., 2009). 

Especially for Japanese who live far from the disaster site: in a domestic and foreign land, these 
ICT became the most convenient and economical means to reach out directly to the people affected in 
Japan locally, share real-time information across the world through networks of digitally connected others, 
and to make decisions about how to help. As one of these myriad virtual responders, I myself as a Japanese 
expatriate following the aftermath of the 3.11 disaster started to recognize a pattern of global responses, 
besides groups of technically savvy experts and experienced volunteers as well as projects for monetary 
donations and emotional and eventful purposes. As sensible relief efforts, a typical response was material 
aid made to Japan from abroad, which includes shipping handicraft quilts from Singapore, muslin from UK, 
origami from Canada for instance. On the other hand, as the domestic response, visual and image archiving 
projects and other visual information preservation efforts were also prevailing. While these efforts were 
originated by and emerged through networks of concerned individuals and made their way promptly to the 
disaster-affected communities, the lifespan of these efforts was most likely transient, otherwise incorporated 
with similar or larger enterprises. 

Compared to these examples as an emerging group, I found a rare case of emerging group efforts in 
response to disasters. The case of this present paper comprised of Japanese women in Finland. These women 
demonstrated paradoxical aspects in terms of 1) a population outside of the disaster site who resides 
thousands of miles away from the disaster site but was able to take action and persist to manage an 
improvised humanitarian efforts, 2) an emergence of self-organizing group comprised by ordinary people 
who demonstrated no domain-specific knowledge and prior experiences of disaster response and aid 
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operations, and 3) remote collaborative work involving undefined tasks and unknown sequences of group 
activities with unfamiliar members. Here is a brief story of the case. 

Seven Japanese women living in two cities in Finland became leaders of a self-organized 
humanitarian aid group for the 2011 Japanese disaster. What began as blog posts that are disseminated 
through blogs and Twitter, the “Tutteli to Japan” (TTJ) became a project to send Finnish milk formula 
to the disaster-affected communities. The TTJ project started in a pre- existing blog of one Japanese 
housewife living in Finland. It began as a single post articulating compassion and, indeed, a sense of 
powerlessness. This was a typical kind of post seen in many settings, especially right after the disaster. And 
yet four days later, all of sudden, another post on the blog began by introducing a baby formula product 
from Finland. Then the online interaction snowballed into group action and collaborative work. Eventually 
the collectives of individuals worked on their ideas of “do something for Japan” and turned it into 
humanitarian efforts (Table I). 


A project that shipped a total of 12,000 Tutteli milk cartons from Finland to 12 different 


k ihe locations in the disaster affected areas of Japan through 6 separate shipments within 
35 days. 

Who was 7 lead members of Japanese women living in Finland (Helsinki and Tampere). 

; More than 15 active members who reside in six different prefectures in Japan (Nagano, 

involved? 


Chiba, Tokyo, Miyagi, Iwate, and Fukushima). 

In Japan, baby formula is manufactured and sold only in powder-based form, while in 

Finland the liquid formula product; Tutteli is even made for newborns, and well adapted 
Why baby in the Finnish culture. Japanese mothers of the TTJ lead members believed that Tutteli 
formula? would help mothers who were in the disaster sites experiencing difficulties in feeding 
their babies due to the turbulent, unsettling circumstances with little or no electricity 
and water supplies. 
Over the course of 35 days, the TTJ members went through an ad-hoc process of 
connecting points to points that eventually led to the people who needed the milk within 
the disaster site. The lead members fundraised in Helsinki and online, purchased the 
cartons with the donations, picked up the cartons from the manufacturer and delivered 
them to Finnair Cargo at the Helsinki airport. 


Home The active members who found out about the TTJ through social textual spaces 
including Twitter and weblogs, offered to 1) identified who needed the milk in the 
disaster site, 2) picked up, delivered, and distributed the milk cartons to the identified 
recipients. 

Finnair volunteered free cargo shipments. 
The idea started circulating around March 14th. 
When? The first shipment departed Helsinki on March 25th and was delivered to the disaster 


sites on 29th. 
The remaining five shipments were completed between April Ist ~ April 30%. 
Where to? The mothers in the 12 different disaster sites received the bulk supplies of milk. 


Table 1: A snapshot of the “Tutteli to Japan” project (TTJ) 


3 Citizen-led, self-organized humanitarian efforts 

Grassroots efforts in response to disasters emerge at any given time, space and scale of disasters (Enarson, 
2012). Our review on communities in disasters shows that disaster studies repeatedly capture groups of 
ordinary citizens engaging in rescue and response operations, particularly within the disaster-affected 
communities (Takazawa and Williams, 2011). As the people outside of disaster sites also begin to learn and 
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seek information about the disaster impacts, residents of neighboring communities and beyond join the 
response operations. The proximity to the disaster site is considered the critical factor for emergent behavior 
to potentially form organized efforts (Drabek, 1986). These “emergent group responses” tend to last small 
and short-lived, relying on disaster-affected individuals who do not remain helpless, but take on roles of the 
first responders in order to serve the suffering others, however devastated (Drabek, 1985; McEntire, 2007). 
Looking at members of emergent groups, these are of private citizens, individuals who are distinctly 
independent from organized entities, either with or without pre-existing structures and experience working 
together on a variety of tasks prior to disasters, in various conditioned work environments. Dynes and 
Quarantelli (1975) developed a typology of emergent groups that differ in its group structure (new or old) 
and tasks (irregular or regular) as primary elements to define organized efforts. Stallings and Quarantelli 
(1985) suggest that the developmental nature of post-disaster environments creates different demands for 
different groups at different times. Moreover, due to the nature of disasters that are concentrated in time 
and space, geographic locales of responders determine accessibility and feasibility of action. 

Recent studies identify the critical roles that technologies play in enabling ordinary people to 
participate and engage in disaster and crises responses in new ways. Liu (2011) and Takazawa (2010) argue 
that ordinary citizens who do not necessarily engage in disaster response and humanitarian activities at 
present do respond to long past disastrous events through social interactions and discussions facilitated by 
social media. The social media space functions virtual gateway for ordinary citizens to reflect, reconstruct, 
and reinterpret disastrous events as to disasters and crises becoming timeless. Pentzold (2009) and Mori 
(2011) also argue that the temporal connectivity that social media platforms provide serves as the inception 
of constructing infinite connectivity among distant others. Sharing photos and motion pictures on YouTube, 
Wikipedia, weblog and other social websites foster sentimental attachment and sense of space among 
individuals in a form of collective knowledge construction, global memorization, and collective sensemaking. 
As the landscape of disaster response space diversified, both people inside and outside of the disaster- 
affected site or communities can proactively adapt and appropriate available resources and technologies to 
continue or reestablish the connectedness between themselves across time and space (Mark and Sameen, 
2009). Moreover, the resilient behavior of ordinary people engaging in disaster relief efforts trying to 
maintain the connectedness with the people outside their proximity significantly contributes to the whole 
community affected by disasters to regain internal strength and resources to cope with the impact of 
disasters (Murphy, 2007). 

As these studies suggest, the increasing adaptation of ICT and the restless introduction of 
destructive events around the globe promises to expand the role of citizens not only participating through 
such organized efforts as digital volunteer work, but also self-organizing group efforts beyond time and space 
(Starbird, 2011; Starbird and Palen, 2013). However, the processes that ordinary citizens go through in 
making such efforts are in fact an understudied subject area. This research aims to explain in great detail 
the unfolding processes of self-organizing groups as well as its non-linear interactions among members 
coming from different locations, cultures, and backgrounds in disaster contexts. 


4 Method 

The present poster specifically presents one single case of self-organizing group comprising geographically 
dispersed ordinary citizens who did not have history of working together, social relations prior, and no 
specific professional or personal affiliations but formed a group after the 2011 disaster of Japan. To our 
knowledge, it is a new phenomenon to study this type of self-organizing groups outside controlled laboratory 
settings. Using a case study method that provides a flexible approach to gather detailed contextual data 
and conduct in-depth analysis on particular phenomena, this single case study design allows us to identify 
specificities of self-organizing groups and analyze the data of a real setting in detail (Eisenhardt, 1989; 
Torrey et al., 2007; Tschan et al., 2009; Goggins et al., 2012). Currently, with the opportunistic and snowball 
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sampling methods, I am gathering the TTJ data that I can retrieve electronically using commercial search 
engines, RSS feeds, and proprietary databases. Those data sources include: the TTJ leaders and participants’ 
blogs, TTJ tweet archives, news coverage, and other publicly available data (both textual and visual forms). 
Those materials are available in Japanese, English, and Finnish. I have begun to translate and apply the 
grounded theory approach to this primary set of data. I am using the blog posts to trace the temporal 
processes in the emergence of the group as well as the project. I am looking into interactions and activities 
to gather both factual and nuanced descriptions of involved parties and activities whether it is explicit or 


implicit. 


5 Preliminary Findings and Discussions 


Certain features of this case differ from other self-organizing groups that emerged in disasters. Unlike well- 
studied digital volunteer work type of efforts I discussed earlier, this particular type of self-organizing group 
is unique in the following three aspects; 1) technologies that they used, 2) aid that they provided, and 3) 
scale of their action and group. The technologies the group used are ones that they are familiar with, and 
using such generic technologies the group could transform their everyday practices to powerful action. Their 
uses of such generic technologies accelerate initial attention from larger pool of populations and lead to 
immediate response and interaction as in the creation of instant information infrastructure and backchannel 
on social media space. 

Also, according to stereotypes of Japanese, in Japan or abroad, volunteering is not cultural practice 
in Japan. Taniguchi (2010) claims that “[T]he idea of volunteer work as an act of giving time to help others 
on one’s own initiative is relatively new in Japan” (p. 161), because of its long tradition of neighborhood 
collective culture and social responsibility attached to their local communities. In terms of women in disaster 
relief activities, we might not expect them to seize the initiative and self-organize especially among Japanese 
women (Ferris et al., 2013). Aspects of Japanese culture involve a lot of coordinated collaborative activity 
with high levels of social coherence and participation. But this is typically coordinated by recognized leaders, 
and there are distinct hierarchies, and roles involving status and gender which is why one might be surprised 


at this particular form of self-organizing. 


6 The next step 


I plan to collect additional data by interviewing and running focus groups with the TTJ leaders. With 
snowball sampling, I plan to continue finding additional data sources to include the TTJ participants. Since 
my preliminary findings presented in this document are all based on data that I gathered from members 
weblog, websites, and other publicly available online resources, I want to verify the findings and identify 
actual parameters for the processes of how the group came into being, what kinds of group interaction and 
activities were carried out over time, what barriers and obstacles appeared and how they resolved, who 
were involved in the process, how these individuals came to know the project and each other, as well as 


what their decisions were over the project participation and engagement. 
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Abstract 

Qualitative content analysis is commonly used by social scientists to understand the practices of the 
groups they study, but it is often infeasible to manually code a large text corpus within a reasonable time 
frame and budget. To address this problem, we are building a software tool to assist social scientists 
performing content analysis. We present our semi-automatic system that leverages natural language 
processing (NLP) and machine learning (ML) techniques for initial automatic coding, which human 
coders then review and correct. Through active learning, these human-verified annotations are 
subsequently used to train a higher performing model for machine annotation. We discuss design 
strategies adopted to optimize the system performance. 
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1 Introduction 


Social scientists often use content analysis to understand the practices of groups by analyzing texts such as 
transcripts of interpersonal communication. Content analysis is the process of identifying and labeling 
conceptually significant features in text, referred to as “coding” (Miles and Huberman, 1994). However, 
analyzing text is very labor-intensive, as the text must be read and understood by a human. Consequently, 
important research questions in the qualitative social sciences may rely on insufficient data or may fail to 
be addressed at all. 

Computers offer large-scale processing capabilities to deal with systematic patterns in data. 
However, computers are still not able to truly understand the more subtle meanings in text, so full 
automation of qualitative content analysis is not yet possible. Furthermore, many natural language 
processing (NLP) and machine learning (ML) techniques require training on a large amount of coded data, 
which is time-consuming to produce. 

To make qualitative content analysis more scalable, we propose a semi-automatic system that uses 
a small set of hand-coded examples created by human coders to build a model that can perform a first pass 
of coding. Human coders then review and correct machine identified instance of codes. These human-verified 
machine annotations are then used as additional training examples to improve the performance of the ML 
model. Using this “active learning” approach, we create a significantly larger pool of training examples in 
a reduced time frame. This paper presents the framework of our proposed approach, and reports some 
preliminary findings in our initial efforts to optimize the configuration of ML models to perform automatic 
coding. 
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2 Related Work 


Many computer-assisted qualitative data analysis software (CAQDAS) tools have been developed to support 
text analysis, but these are not intended for full automatic coding (Alexa, 1997). Researchers have 
attempted to automate content analysis by applying NLP and ML technologies to identify linguistic 
patterns in text. For example, Crowston et al. (2010) manually developed NLP rules to automatically 
identify codes related to group maintenance behavior in free/libre open source software (FLOSS) teams. 
Ishita et al. (2010) used ML techniques to automatically classify sections of text within documents on ten 
human values taken from the Schwartz’s Value Inventory. Broadwell et al. (2012) developed language 
models to classify sociolinguistic behaviors used to infer social roles (e.g., leadership). The accuracy of these 
approaches on the best performing codes ranges from 60-80%, showing the potential of automatic qualitative 
content analysis. However, these prior studies have been limited to a particular set of theoretical concepts, 
limiting their general utility. 


3 Approach 


Figure 1 shows the three major components in our proposed semi-automatic approach: 1) human annotation, 
2) machine annotation, and 3) human correction of machine annotation. 


AE O J P: 
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Figure 1: Major system components 


Human Annotation: Human coders first manually code a sample of the corpus (in ATLAS.ti, a CAQDAS 
package) to develop gold standard data for machine annotation. Once manual coding has been finalized, 
the coded text in ATLAS.ti is exported in an XML format. 


Machine Annotation: The gold standard data from the human coders is used to train a support vector 
machine (SVM) model using pre-selected features and parameters. We approach the machine annotation 
problem as a text classification task, classifying sentences from the corpus as containing or not containing 
various codes. 


Human Correction: Machine annotations are corrected by human coders. The human-verified annotations 
are saved as “silver data”, and subsequently added to the training set to enhance the performance of the 
existing model through active learning. This human feedback loop grows the training set gradually, rather 
than requiring a large initial coding effort. 


The model performance is assessed by a combination of recall and precision. Recall measures the ability of 
the model to find all instances of a code in the corpus, whereas precision measures the percentage of instances 
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returned that are correctly classified. In our system, we emphasize recall, because if recall is high enough, 
the annotator can depend on the system to find most instances of codes rather than searching the text 
manually. To achieve high levels of recall, precision will be low, at least initially. Even though human coders 
will have to review a number of false positives, the system should still save time in large-scale content 
analysis, as the coders have to read only a subset of the corpus. Adding the human corrected data should 
improve the precision of the model for future rounds, ideally to the point that the system will produce 
accurately-coded data without human input, though practically, a final human review may still be necessary. 


4 Preliminary Findings 

We report preliminary findings from our efforts to optimize the configuration of ML models to perform the 
first pass of machine annotation. For these tests, we use a gold standard corpus created in a study of 
leadership behaviors exhibited in emails from a FLOSS development project (Misiolek et al., 2012). This 
gold standard corpus consisted of 408 email messages. There were a total of 39 codes in the coding scheme. 
Sentences may be assigned more than one code. Framing the coding as a multi-label classification task, we 
trained a binary model for each code using SVM with ten-fold cross-validation. These results do not use 
any “silver data”. 

To date, the best model (the one that resulted in the highest recall in model learning) uses only 
lowercase unigrams as features, with certain specific tokens such as numbers and hyperlinks substituted 
with more generic tags (e.g., all occurrences of numbers are substituted by <num>). The highest average 
recall we achieved for all 39 codes is 0.702, meaning that the model is able to detect 70% of positive instances 
on average from the corpus. On the other hand, average overall precision is only 0.078. Table 1 highlights 
the top five individual codes with the highest recall. As expected given the high recall, the precision is low. 


Code Gold Frequency Precision Recall 
Approval 12 0.062 0.95 
Commit /Assume Responsibility 17 0.041 0.935 
Apology 9 0.026 0.929 
Phatics/Salutations 116 0.174 0.896 
Inclusive Reference 146 0.336 0.873 


Table 1: Top five codes from FLOSS pilot data with the highest recall 


Figure 2 shows the distribution of all codes into four quadrants based on the level of recall and precision. 
Recall and precision values above 0.5 are considered to be high; 0.5 or below, low. A good model would 
have results in the high/high quadrant, but none of our codes currently reach this level. For the majority 
of the codes (30 out of 37), system performance falls into the quadrant with high recall and low precision, 
reflecting our strategy to tune the model for high recall even at the expense of low precision. A model with 
low recall and high precision might result in a more accurate model but will also miss out many positive 
instances, which could result in invalid conclusions when the coded data are analyzed. None of the codes 
fall into this quadrant. 
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Figure 2: Distribution of individual codes based on initial recall and precision 


Finally, for seven codes, the model exhibits both low recall and low precision, an undesirable outcome. Five 
out of seven codes had fewer than ten examples each in the gold standard corpus, which were too few 
examples to effectively train a useful model. Providing more instances of these codes in the gold standard 
data may improve performance. However, two other codes (Problem Solving and Managing Conflict) had 
more examples in the gold standard corpus but still fell into the same quadrant. Further consideration of 
these codes reveals that the theoretical concept being captured is actually a complex process rather than a 
simple behavior. As a result, the coded sentences included considerable variation, which is hard for a model 
to learn. Indeed, the human coders independently reached the same conclusion, and these two codes have 


subsequently been removed from the code book. 


5 Conclusion and Future Work 


We have presented the framework of our semi-automatic approach to content analysis of qualitative data, 
and explained the two design strategies we have adopted for our system: 1) tuning the ML model 
performance to emphasize recall rather than precision, and 2) using active learning to continuously train 
models to yield better results over time. Using this proposed approach, our ultimate goal is to help 
computers and humans (i.e., social scientists) work closely together to perform large-scale content analysis 
of qualitative data in a reliable fashion. This paper reports preliminary findings from the implementation 
of our first design strategy to optimize the configurations of ML models for initial automatic coding. 

As part of our future work, we will continue to experiment with different features and model 
parameters that can further improve the recall of the results for each code of interest. We are currently 
working on the implementation of our active learning strategy through the creation of silver data being 
incrementally fed back as training sets for model enhancement. Finally, we hope to encourage more social 
scientists to help us pilot test this system with other kinds of research data. 
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Abstract 

This poster proposes the use of Named Entity Recognition as a heuristic tool for improving manual 
document classification. This technique was developed as part of a project studying collaborative work 
via the acknowledgment statements found in a corpus of formally published journal articles. We 
demonstrate how uncertainty in our initial text mining results were ‘ground-truthed’ using Natural 
Language Processing tools in a quick-and-dirty fashion. To verify this technique’s validity, we offer some 
initial results from our larger study. 
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1 Introduction 


The formally published scientific journal article has been mined, examined and evaluated in nearly every 
aspect; titles, authorship lists, abstracts, methods, figures, footnotes, and citations have all been used to 
better understand the way a field of science communicates, collaborates and makes new knowledge claims. 
Past work has shown that the “acknowledgments” section of a journal article can be especially 
helpful in shedding light on the often neglected, or invisible work of collaboration (Cronin, Shaw and 
Labarre, 2003; 2004), especially in domains that depend on expert methodological knowledge and instrument 
building (Salager-Meyer et al, 2010). As part of an on-going research project, we’re exploring 
acknowledgment statements found in a large corpus of bioinformatics texts to better understand 
collaborations between the diverse peoples, technologies, and research tools that produce computational 
biological knowledge. In particular, we want to better understand how successful interdisciplinary 
collaborative arrangements distribute credit, how material resources are cited, and how computational and 
biological knowledge have subtly blended in this field over time. In a field like bioinformatics, research 
questions about acknowledgment and authorship practices are further complicated by the increased scale of 
collaboration, and the heterogeneity of scholarly products generated over the course of a research project 
(e.g. code, datasets, executable workflows) which are not easily attributable to one, or even a few “authors.” 
Understanding how credit is established and formally recognized in this field will help policy makers better 
understand and design incentives and reward structures so that both funding agencies and information 
systems developers might optimize cooperative work arrangements (Howison and Herbsleb, 2011; 2013). 
Our work diverges from previous studies of acknowledgment in some important methodological 
ways. Past studies relied upon the manual extraction of bibliographic data, and the labor-intensive 
annotation of acknowledgment texts for the purposes of later classification (Giles and Councill, 2004 a 
notable exception). Here we present our first steps towards applying natural language processing (NLP) 
techniques, as well as text mining methods to extract acknowledgment texts from a corpus of documents 
gathered from the PubMed Central Open Access collection. During this phase of research we have focused 
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on finding economic ways to increase the speed of our classifications without sacrificing accuracy, nor 
reliability. In that vein, our research questions include the following: 


e With little to no customization, can NLP tools like the Stanford Named Entity Recognizer (Stanford 
NER) help us initially evaluate the quality of a corpus of acknowledgment statements? And, can 
they identify “entity rich” acknowledgments on which we should focus our initial analysis? 

e How effective are general, out-of-the-box NLP tools at recognizing entities in a domain specific 
corpus (such as bioinformatics)? 

e How can we best leverage tools that deliver quantitative results (e.g. number of entities per 


acknowledgment statement) to support or aide further qualitative enquiry? 


2 Methods 


2.1 Corpus Construction 


We assembled a representative collection of bioinformatics texts from PubMed Central’s Open Access 
(PMC-OA) corpus. The PMC-OA includes the full text of completely open access journals, and the NIH- 
portfolios of other paid access journals. We selected texts from two high-impact, open access bioinformatics 
journals (PLoS Computational Biology (n=2776) and BMC Bioinformatics (a=5765)) and one high-impact, 
limited access journal (the NIH portfolio from Bioinformatics (n=1200)) (Table 1). Each article is encoded 
in .nxml format, utilizing Z39.96, the Journal Archive Tag Suite (JATS). 


Bioinformatics BMCBioinformatics PLoSComputBiol Total 
2000 1 1 
2001 9 9 
2002 40 40 
2003 66 66 
2004 209 209 
2005 371 71 442 
2006 633 169 802 
2007 1 599 251 851 
2008 144 731 298 1173 
2009 269 729 394 1392 
2010 279 845 422 1546 
2011 201 719 426 1346 
2012 242 601 530 1373 
2013 64 212 215 491 
Total 1200 5765 2776 9741 


Table 1: All articles in the corpus were published between 2001-2013; n=9741. 


2.2 Text-mining acknowledgments 

Utilizing BeautifulSoup!, a Python library that supports html and xml processing, we wrote a series of 
scripts to extract acknowledgments sections from each article”. Because of PMC-OA’s’s use of the JATS 
markup, extraction of these statements was straightforward for the majority of our sampled articles (5897), 


1 http://www.crummy.com/software/BeautifulSoup/ 
2 code available at https://github.com/akthom/ParatextsAndDocumentaryPractices 
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which encoded their acknowledgment statements with the JATS <ack> tag, intended to specifically 
designate acknowledgment text. 

We found that a large portion of the articles encoded their acknowledgment statements using a combination 
of the more general <back> and <sec> tags, which are catchalls for many of an article’s back matter, and 
any discrete section of an article, respectively. Our more general script extracting the contents of both 
<ack> and <back> tags pulled an additional 2377 sections of text (total statement extracted: 8427, or 
86.5% of the total corpus), with an estimated 1% error rate. We also extracted each article’s author list, 
and tallied the total number of authors per article (see Figure 1). 


2.3 Named Entity Recognition 


After text mining the acknowledgment statements from our corpus of bioinformatics documents (n=9741) 
we parsed the texts with the Stanford Named Entity Recognizer (Stanford NER; Finkel, Grenager & 
Manning, 2005) using a 4 class model trained to recognize and tag persons, organizations, locations and 
miscellaneous “other” entities. We then manually reviewed a small random sample of the results (n=100) 
to review the NER’s efficacy. 


3 Results 


Overall, the Stanford NER identified 21985 unique persons, 30223 Organizations, 10444 Locations, and 5423 
Misc entities. After manually reviewing results from a sample of acknowledgment statements we found that 
the person entity tagger was by far the most accurate, and helped us further explore whom was 
acknowledged, and how often. While the organization tagger worked fairly well (with over 60% accuracy in 
our reviewed sample), it would sometimes parse organizations with compound names into more than one 
entity (eg. “Center for <ORGANIZATION>Insect Science</ORGANIZATION> at the 
<ORGANIZATION> University of Arizona</ORGANIZATION>). Misc entities proved unreliable, and 
too difficult to assess (the Stanford NER often erroneously tagging adjectives like “Open Access” and 
“Dutch” as entities, while also tagging entities that could arguably be classified as organizations, such as 
the “OBO Edit Working Group”). We do, however, note that the misc tagger did identify a number of 
computing facilities and software packages as entities, giving us hope that the method could be altered to 
automatically extract computational entities in the future. 

We compiled a list of the most commonly acknowledged persons in our corpus, and then tried to 
identify each person’s title and institutional affiliations using author affiliations from the articles themselves, 
and then generic internet searches to further flesh out each person’s role within an institution (Table 2). 


Name # ack Job title 

Elena Rivas 16 Janelia Senior Scientist, Howard Hughes Medical Center* 

Vasant Honavar 11 Professor of Computer Science and head of Artificial 
Intelligence Research Lab, Iowa State University* 

Burkhard Rost 10 Computational Biologist and Computer Scientist, 
Technical University of Munich* 

Chris Mungall 10 Bioinformatics Scientist, Lawrence Berkeley National Lab 

Gary Bader 10 Professor of Molecular Genetics and Computer Science, 
The Donnelly Centre, University of Toronto* 

Terry Mark-Major 10 Business Manager, University of Tennessee Health Science 
Center 

Alex Skrenchuk 9 IT Manager, Stanford Center for Biomedical Informatics 
Research 
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Alexander Zien 9 Research Scientist, Max Planck Institute for Intelligent 
Systems 
Eran Segal 9 Professor and Computational biologist, Weizmann 


Institute of Science* 
Isobel Peters 9 Senior Project Manager, BioMed Central 


* appears to manage her/his own lab 


Table 2: The ten most frequently acknowledged individuals in our corpus. 


We found that the ten most frequently acknowledged individuals were evenly split between researchers who 
are the director or lead scientist of a lab, and researchers who appeared to have support staff roles. In this 
case, NER-augmented classification helped us quickly see that our dataset contained information relevant 
to our broader research questions regarding the invisible work of collaborative projects, and encouraged us 
to further explore the relationship between authorship and acknowledgment within this corpus. 

We compared the number of authors per article per year to the number of acknowledged individuals 
per article per year, to get a sense of whether there were any noticeable authorship or acknowledgment 
trends within bioinformatics publications more generally (Figure 1). 
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Figure 1: Average number of authors per article per year compared to the average number of 
acknowledged individuals per article per year. 


Interestingly, we noted slight downward trends in the number of acknowledged individuals per article per 
year, apparently corresponding with slight upward trends in the number of authors per article per year. 
One possible explanation for this trend is that the BMC Bioinformatics and PLoS Computational Biology 
collections both include editorial matter in addition to peer reviewed journal articles, and the PLoS corpus 
also includes conferences proceedings; thus the downward trends in number of acknowledged persons per 
article could be the result of increased inclusion of articles without acknowledgments sections thereby 
“watering down” our results and making it appear as if the number of acknowledged individuals is 
decreasing. 
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This has encouraged us to look at differences between types of publications and whom, or what, 
was acknowledged; our future work will explore how acknowledgment and authorship differ between regular 
publications, software publications (somewhat unique to bioinformatics publishing) and conference 
proceedings. Using NER as a rough classification heuristic allowed us to narrow in on this area relatively 
quickly, and sensitized us to the relationship for future work. 


4 Conclusions and next steps 


We have found that using NLP tools in a heuristic way can be quite helpful in quickly evaluating the 
relevance of a corpus for further, more rigorous analysis — and furthermore, for identifying future directions 
in the development of named entity recognizers. In the context of our larger project, use of NER tools 
helped us quickly determine the relevance of bioinformatics acknowledgment statements to studies of 
collaboration, and to determine whether or not the number and types of named entities would warrant 
further manual classification. 

This quick and dirty work encouraged us to continue analyzing our named entities in conjunction 
with our manual classification of acknowledgment types and tropes. It also helped us recognize the 
important relationship between acknowledgments and authorship statements. In future work we hope to 
apply our methods to a more diverse corpus of acknowledgment statements, to further explore underlying 
reasons for the above trends in authorship and acknowledgment rates, and to examine the relationship 
between article type, editorial policy, and acknowledgment practices. Additionally, we hope to explore 
customization of a named entity recognizer specific to the needs of this work; an NER designed to identify 
computing facilities and software would not only aid us in our research, but could also more generally 
support scientometric analysis of the impact of computational resources. 

Finally, we note that named entity recognition may provide publishers and researchers alike with 
a way to augment existing text encoding schemas, such as JATS. While the JATS markup facilitates more 
precise entity extraction, it is unrealistic to expect publishers (and text encoding schema developers) to 
encode all possible entities of interest. Post hoc named entity extraction can supplement metadata- 
facilitated information extraction efforts, particularly in fields like bioinformatics, in which authorship and 
acknowledgment practices may be rapidly evolving. 
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Abstract 

High turnover rates in online communities suggest the need for measures that move non-sustained 
community members towards sustained participation. Non-sustained members actively seek inclusion 
opportunities, but are often disappointed by the lack of access to existing members. One method of 
inclusion can be contact with existing members through dialogue on discussion boards. Understanding 
the structure of interactions between sustained and non-sustained members and can inform new strategies 
to address the high turnover rate and ensure community longevity. In this poster, we analyze the network 
structure of newcomer and existing member interaction through discussion posts. Through analysis of a 
citizen science community we ask: Are non-sustained and sustained participants engaged in 
conversations? The researcher analyzes the topological features of an affiliation network and centrality 
measures to determine the extent of interactions between these two groups. Finally, the researcher 
presents strategies to engage non-sustained participants in online. 


Keywords: affiliation networks, network analysis, citizen science, newcomers, online communities 

Citation: Jackson, C. (2014). Event Based Analysis of a Citizen Science Community: Are New and Non-sustained Users Included? 
In iConference 2014 Proceedings (p. 1139-1144). doi:10.9776/14405 

Copyright: Copyright is held by the author. 

Acknowledgements: This project was partially supported by the US National Science Foundation under grant number 1211071. 
I would also like to thank the Zooniverse Research team at Adler Planetarium and Syracuse University 


Contact: cjacks04@syr.edu 


1 Introduction 


Sites relying on volunteer contribution must maintain an active pool of participants to ensure work is 
complete. Burke (2009) noted the high turnover rates, especially for new comers to online communities, 
which highlight the importance of early newcomer involvement in the community. Participant access and 
interaction with sustained users and more knowledgeable community members such as moderators and 
domain experts serves to orient non-sustained users to communities of practice and thus sustain their 
membership in them. These conversations and acknowledgment of work become essential for building 
relationships with community members (Griffin, Colella, & Goparaju, 2000) and enhancing participant 
learning of community norms. 

Burke et al. also suggest newcomers actively look for ways to be included in a community and one 
way in which they can be involved is by posting a message and assessing their role in the community 
through replies (Burke, Kraut, & Joyce, 2009). The same research also showed community responsiveness 
would often lead to sustained participation. A similar study by (Arguello, 2006) found newcomers to Usenet 
groups are more likely to come back for subsequent visits if others reply to their comments. It is also known 
that the effects of newcomer sustained community involvement are stronger if more sustained users respond 
to newcomers (Kraut & Resnick, Under contract) Online communities vary in their responses to newcomers 
and some offer guides on how to build successful relationships with newcomers, for the purposes of fostering 
their confidence in contributing. In fact, Wikipedia has a page dedicated to how to handle new comers 
titled: Wikipedia: Please do not bite the newcomers, which gives instructions in newcomer interaction. 

Given the importance of newcomer interaction, this research is concerned primarily with the 
responsiveness of a community of practice to non-sustained users in a citizen science project from the Citizen 
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Science Alliance’s Zooniverse suite of projects titled Planet Hunters. In this community, citizen scientist — 
amateurs and professionals alike are asked to annotate images to identify extrasolar planets. An example of 
the interface can be found in [Image 1]. Participants are asked to identify dips in light emissions from 
planets, which could highlight the presence of other transiting planets. After identifying the dips users are 
prompted to talk with other community members about the image they annotated [Image 2]. Participants 
can see what others have said about the image and more importantly ask questions about the image features 
which contributes to their understanding of the task and knowledge of the science about the project making 
this an important place for the exchange of information. 


OBJECT APH33060980 


ONAPI AAN 


Would you like to 
discuss this star? 


H MARK AS FAVORTE + DOWNLOAD DATA 


Figure 1: Annotation Interface Figure 2: Discussion Interface 


Traditional network analysis has focused on characteristics of nodes while removing context from the 
interaction. Combining people and the context through which they are connected could provide additional 
insight and allows one to make claims about the structure of the network and integrations with community 
participants (Easley and Kleinberg, 2010). Affiliation networks allow us to represents social interactions 
among collections of actors (Carrington, Scott, & Wasserman, 2004) through shared membership in a 
contextual manner. The research analyzes discussion posts to see if new comers and sustained users are 
conversing in a joint space. This method supports different perspectives on linkages between actors and 
events (Carrington, Scott, & Wasserman, 2004). In this case, membership in discussions serves to link 
actors to collectives in which they participate and link non-sustained participants to sustained community 
members. One of the earliest examples of affiliation networks was presented in a study by (Davis, Gardner, 
Gardner 1941) where networks of southern women attending parties were analyzed to determine class 
structure. In this research, using affiliation networks to characterize the network structure of discussion and 
participants is an appropriate method od analysis as we seek to answer our research questions: Are non- 
sustained and sustained users engaged in conversations? This work is purely descriptive and seeks to 
understand how one category of users is related to another. 


2 Methods 


The researcher used Cytoscape to construct an affiliation network of a dataset consisting of logs of discussion 
board posts from non-sustained participants and sustained users who appeared in the same posts during 
July 2013. Data was collected from members who are defined as non-sustained and the sustained 
participants who commented on their posts. In the case of Planet Hunters, sustained users are described as 
those who have made more than 1000 contributions in the form of image classifications. The choice to 
employ 1000 classifications as a distinctive feature which separates non-sustained and sustained participants 
emerged from interviews with sustained users who mentioned only after 1000 classifications did they become 


1140 


iConference 2014 Corey Jackson 


competent in the task. Information about the amount of classifications a user contributed was also included 
in the dataset. 


3. Results 


The network can be visualized in image 3 and exhibits one large network and a number of isolated nodes. 
The thickness of edges represents comments from users who are sustained as established by our 1000 
classifications or greater threshold. The network is comprised of distinct 1395 nodes and 1343 directed edges 
extending from nodes of participants. It is important to note that since the graph is directed no edges 
extend from discussion posts. The nodes in image 3 are of discussion posts and participants. The edges are 
weighted for participants with over 1000 classifications and the weights are further described by coloring 
the edges according the number of classifications within that group where the its scales from darkest to 
lightest as the number of classifications ascend. 


Figure 3: Affiliation Network of User Comments. 


Interesting topological features have emerged in the network. Looking first at the primary network, a 
number of edges extending from the network, which are not bold, and many bolded edges in the central 
part of the network. Most edge weights are closely knit and few non-weighed edges exist in the central part 
of the network, which suggest a close knit between sustained users who comment. Table 1 also supports the 
lack of appearance of sustained users in conversations involving new comers. Given the production from 
sustained participants it is appropriate to expect small number sustained members to seek to include non- 
sustained participants. Another topographical feature of the network is the appearance of local isolates 
[Image 4]. Local isolates are nodes, which appear in the network, but are removed from the primary network 
structure -in this case conversation. This is an important finding and further analysis of the local isolates 
shows the appearance of 6 sustained users and 96 non-sustained users exhibiting this behavior. 


1141 


iConference 2014 Corey Jackson 


“= 7 ~ l ý eam = o ME "e i a C Sm P - 


Oe Qm Oe OMI © ee OME Oe OME Ce MMI oR OME © a OIE © OTT Oa OIE Oa OTIC. LM Oo 0 IO ae OE Oa OIE Os OEE OIO 


OD OO 00 me O00 OO o0 0 HOO OOH Oooo ¥en eee Or Ome Oe O OMT O 10 Oo eee Qe 6 OMIM 101 Oee0 QO 


Figure 4: Local Isolates 


There are a number of non-sustained users connected to the primary network and the structure of those 
interactions most often resembles node interaction in image 5. These participants most often have a greater 
number of posts, which increases the likelihood that sustained users will notice them. Many users in the 
large network are included only because of one interaction with a sustained user and another smaller group 
of participants are included because their path runs through another non-sustained user to a sustained user. 


Classification Group Users Discussion Post Classifications Median Classification 
Non-Sustained (0-999) 347 1039 52,457 41 
Sustained (1000+) 57 211 1,008,028 4087 


Table 1: Group Community Production 


Analyzing this type of graph we can also begin to analyze the structure of the graph to look at the differences 
in centrality measures for the two groups (sustained and non-sustained participants). The researcher looks, 
in particular, at measures of centrality: betweenness, centrality, and closeness. These measures explain how 
nodes of participants are integrated in the network. Table 2 shows the three measures of centrality 
important to helping determine the level of inclusion for each group of nodes. 

Degree centrality, which measures the number of direct connections to a node for sustained users, 
is less than that of non-sustained users, which suggest non-sustained participants are being engaged less 
frequently in conversations and are not included to the degree of sustained participants in the network. 
Closeness, which is a measure of how close a node is to other nodes in the network, is 0.19 for sustained 
users and 0.6 for non-sustained users. Again, sustained users are not seen as being close to other nodes in 
the community and are likely low for sustained users because of the lack of contact with isolates. Lastly, 
betweeness measures how a node exists as a bridge in the network. Again, non-sustained users show a 
betweeness value of 0.03, which is influenced by isolates which are are disconnected from the primary 
network. This measure suggests sustained participants are essential in connected nodes in the graph. 


Non-Sustained Sustained 
Degree 2.99 3.70 
Closeness 0.602 0.19 
Betweenness 0.156 0.03 


Table 2: Network Group Scores 
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4 Conclusion 


The research focused on topological features of an affiliation network of discussion events and participants 
in an online community. A month of log data shows while non-sustained users often begin discussions, their 
comments often do not receive responses from sustained members suggesting they aren’t engaged in 
conversations in the same space. New systems and designs should incorporate measures to identify non- 
sustained participants in communities where the exchange of information and knowledge is essential to 
performing work. Acknowledgement of non-sustained participant work is essential for building the 
community and building participant confidence in their contributions, which could move more non-sustained 
participants to a central and sustained role in the community. Future work will focus on analysis of 
interactions to determine if users whom have interacted with sustained users persist and eventually become 
sustained. One emergent area of research this work seeks to contribute to in the future is dynamic analysis 
of online communities using network analysis. Determining how communities form over time and how user 


roles change could lead to clues about important milestones and events in community formation. 
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Abstract 

Viral videos have become very popular and influential in our technology-driven world. They have a strong 
influence on the production of popular user generated TV shows and websites. Our team explored what 
makes videos to go viral. We aimed to understand the driving factors behind what makes a video go 
from one view to thousands of views. We focused particularly on comedic amateur videos that have gone 
from simply being posted to a social media sites such as YouTube to being aired on popular television 
programs like Tosh.O and Ridiculousness. We achieved our goal by performing a content analysis of viral 
videos, conducting a survey to viewers and organizing an interview with a viral video celebrity. Through 
our research we will present our project background, methodology on collecting data about viral videos, 
content analysis, and concluding factors in viral armature comedic videos. 
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1 Introduction 


Viral videos are videos that have widespread popularity of at least 100,000 views by traveling from person 
to person through email, instant messages, and media-sharing websites (Wallsten, 2010). They have a strong 
influence on the production of popular user generated TV shows and websites. However, the factors that 
make a video more likely to go viral are often mysterious. The goals of our project were to increase our 
understanding of the mechanisms behind viral videos; and to learn the necessary factors for how one can 
attempt to make a video go viral. 

Viral media continues to grow as a segment of the job market, as many companies, politicians, and 
organizations are taking advantage of user generated content to promote different products or brands. In 
recent years numerous companies have established a strong market presence in viral media marketing, such 
as YouTube, Reddit, CollegeHumor.com and funnyordie.com. These companies and organizations help 
create and promote viral media by allowing content to be shared in many different ways, across multiple 
online platforms. For example, on YouTube a user is able to email, post on Facebook, Tweet, and download 
videos. These options enable users to take advantage of simple and fast ways to share videos and content. 
Despite the widespread use and viewing of viral videos, the current literature does not address what factors 
contribute to video sharing. Wallsten (2010) confirms this by stating, “there has been almost no systematic 
empirical research on the factors that lead viral videos to spread across the Internet”. 

This study focuses on comedic amateur videos that go from being posted without fanfare on social 
media sites like YouTube to being aired on popular television programs like Tosh.0 and Ridiculousness, or 
get posted to aggregator websites such as CollegeHumor.com and funnyordie.com. We focused our attention 
on comedy for three reasons. First, comedy is a way to have cross-platform mobility, and is a bridge between 
traditional media conglomerates/popular media norms and transgressive online material (Gurney). Second, 
a pilot investigation identified comedy as an important factor for many of the most-viewed videos on the 
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internet. Whether the comedy style is dry, sarcastic, goofy, political, informative, accidental, or eccentric, 
there is usually some form of humor involved in the internet’s most viewed amateur videos. This is also 
supported in the literature as suggested by Gurney (2011), who highlights the importance of comedy in 
viral videos. Lastly, Tosh.0, Ridiculousness, Collegehumor.com and funnyordie.com are some of the most 
popular TV shows and websites watched and viewed daily. Their content is based off of amateur comedic 
videos that have gone viral through user generated content. People often like to share videos that they find 
funny or entertaining with everyone around them so that they can share in the joy (Berger and 
Milkman,2009). Over the course of this study, we investigated what it took for a video to go viral. 


2 Methods 


To achieve the goals of our project we conducted a survey of viral video viewers, interviewed a viral video 


celebrity, and performed a content analysis of a sample of viral videos. 


3 Survey studies 


The purpose of our survey was to gain a better understanding of methods of video sharing, and also to 
research particular reasons behind the popularity of comedic viral videos among users of social media sites. 
We conducted our survey among undergraduate students aged 18 to 22. We chose this age group because 
of the groups familiarity with viral videos and the internet and because we had the most access to this 
group. Given the diversity of our research team, we were able to administer our survey on the campuses of 
four large public universities, in the southeast and mid-Atlantic. This allowed us to conduct our survey 
among a large sample of users with great diversity. 

We employed snowball sampling as an approach to recruit our survey sample, starting with our 
personal networks and expanding the sample to those we did not know personally through friend of friend 
connections. We aimed to pass the survey around each campus through clubs, organizations, and sports 
teams. We also worked to recruit participants by posting advertisements in open public areas on our 
campuses. We administered the survey both online and in person to increase our response rate. 

The survey included demographic questions such as age, race and gender. It also included questions 
about participants’ habits regarding the ways in which they watch videos such as “Where do you watch 
videos?”, “How do you share videos?”, “How are videos shared with you?”, and “How were you made aware 
of (specific video)?” We provided a sample of highly popular viral videos in our survey and asked 
participants whether they recognized the videos, if they had shared the videos and why, how many times 
they had replayed the videos, and what they liked about the videos. 

These questions helped us gain more insight into video sharing behavior and understanding of the 
key factors that influence users to share or watch a video. The survey was conducted over a four month 
period in 2012. 


4 Interview 


To provide contextual understanding of viral videos from the creator’s point of view, we conducted a semi- 
structured interview with Scarlet from the viral video “Scarlet Takes a Tumble”. This interview was 
conducted to help understand the timing, platform of sharing, spread, purpose, reactions, and success of 
the original author of the video. The interview was conducted via email. Scarlet was more than happy to 
answer our questions and provided us with some interesting and surprising findings which are discussed 
further below. We chose to interview her because we wanted to understand how much of an impact the 
owner of a video has on it going viral and if an owner has any hand in keeping the video relevant and 
popular. One interesting answer she gave us was how exactly the video was placed on the web. She 
explained, “My sister told me to post it so she could watch it but set it to private so that only she would 
see it. I posted but I forgot to set it to private and it just blew up from there” (Renyolds, 2012). 
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5 Content analysis 


There are many different factors that can potentially affect the popularity of a video on social media, 
including timing of video, design, content, shock factor, title of the video, and the method of distribution. 
In our research, we attempted to collect information about these factors by performing a content analysis 
of the videos to understand the factors that contribute the most to the popularity of a video. The content 
analysis involved two main phases. In the first phase we worked to examine a small collection of well-known 
viral videos to identify common features among them. In the second phase, we worked to examine a larger 
collection of viral videos to assess which features from the list created during the first phase were present 
in the videos in order to gauge the importance of each of factor. We looked to answer questions relating to 
the different aspects of the videos. These aspects included what site they were from, the creator’s personal 
information, the video posting date, the video genre, and categories such as home videos of cute kids, pranks, 
political, or ignorant videos. We examined videos that had at least 100,000 views. 

For example, we examined a specific parody video titled “Overly Attached Girlfriend”. This short 
clip, created in 2012, mocked the idea of overly attached girlfriends to the background music of the popular 
Justin Bieber song “Boyfriend”. The video went viral in a matter of days and was the spark of many other 
spin off viral videos and memes. We noticed that this video went viral as a result of a few very important 
factors. Video sharing via social media and the creator’s own response to the videos seem to be two 
important factors that contributed to the video going viral. 


6 Results and Conclusion 


We identified several factors associated with the viral sharing of amateur comedic videos. These factors 
include: shock factor, ability to parody, target audience, timing, relevance of topic, and popularity of the 
initial sharer. Based on the current literature, we focused our attention on comedic videos. Berger and 
Milkman (2010) concluded the “emotional engagement” of the audience is a key to the virality of any online 
video. That is to say, videos must be able to elicit some type of emotional response in the viewer in order 
for it to achieve viral status. Berger and Milkman found that videos that elicited sadness were shared less 
often than those videos which elicited happiness and anger. When videos evoke anger, viewers want to share 
their outrage. When videos cause viewers to respond with a combination of emotions--for example humor 
and shock--they are more likely to be shared. 

In addition to the importance of comedy in viral videos, Gurney (2011) highlights how the structure 
of the videos, such as the ease at which users can remix or parody the content, plays a key role in their 
popularity. The ability to parody is important because users will share the parodied version with friends. 
Then after the parody is viewed friends will refer back to the original increasing the number of views. 
Another factor that is important to consider, when making a viral video, is popularity and reputation of 
the initial sharer. Popularity and reputation of the sharer serves as the medium between the creator and 
the viewers. If the sharer does not have a big enough viewer audience then the video is less likely to reach 
viral status. Also, if the initial sharer is not well respected by their viewer audience people are more likely 
to not watch or think the video is spam. Relevancy and timing are factors that can play a role in a video 
achieving viral status. Videos focus on events that currently happening or particular seasons we found were 
more likely to gain more views. For example, if someone makes a humorous video involving politics during 
election time the video stands a better chance of going viral. Another example is a someone who creates a 
Christmas or Thanksgiving themed video during the holiday season. As long as the video is related to 
something that is relevant the video will draw attention from viewers. 

In conclusion, although all videos do not possess every one of the factors listed those possessing one 
or more likely to go viral. The highest viewed videos possess combinations of the factors identified in our 
study. 
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Abstract 

In the 2012 US presidential election, there was concern about voter turnout. Since Obama for America’s 
use of social media during 2008 Presidential elections, there has been growing speculation of social media 
becoming a medium for re-engaging citizens in politics. Hence, social media’s role in political engagement 
and the nature of political engagement were examined via three analyses of Twitter data (i.e. network 
posting frequency, sentiment analysis, and social network analysis) and one survey study. The results 
showed that Twitter’s impact on political engagement is simply about spreading awareness — it still 
depends on whether open-minded, and politically and civically interested users see the politically relevant 


tweets. 
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1 Introduction 


During the Presidential Elections of 2012, even though overall voter turnout decreased, the voter turnout 
of United States youth and minorities remaining historically high (Morris, 2012). According to Pew Research 
Center’s Demographics of Social Media Users, social media usage is associated with higher levels of youth 
and minority participation. A greater percentage of social media users are young adults between the ages 
of 18-29; a greater percentage of African Americans and Hispanics are more likely to use social media than 
Whites; and a greater percentage of African Americans are more likely to use Twitter (Duggan and Brenner, 
2013; Smith and Brenner, 2012). Our studies focused on the concept of the monitorial citizen as a more 
accurate representation of the nature of voters in the Presidential Election of 2012. Monitorial citizens are 
citizens who participate in politics when they feel that something requires attention or may become 
threatening (Zaller, 2003, p.118). Therefore, the historically high voter turnout of United States youth and 
minorities may reflect the perception of a need to address political issues important to these groups or 
perception of a threat in the past election (Morris, 2012). The concept of monitorial citizens may explain 
the contradiction between the presence of “slacktivists,” those who are passively engaged online via social 
media, and those who effectively use social media in their political engagement (Klafka, 2010 in Breuer and 
Farooq, 2012, p.4). We propose that during politically significant times (e.g., elections) social media would 
reflect and facilitate the arousal of monitorial citizens through activity frequency, sentiment, network 


characteristics, and offline political engagement. 
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2 Method 


Twitter was used due to its popularity as a social media medium, the presence of hashtags that are used to 
organize conversations, and its use alongside political events (e.g. communication of opinions during 
Presidential Debates via #debates, political fact checkers such as @poltifact and @factcheckdotorg) 
(Casserly, 2012; Memmot, 2012]. The dataset used was purchased from GNip, a social media data provider. 
The data was filtered to only provide Twitter communications (tweets) during the week before the 
Presidential Elections 2012 (October 31, 2012 to November 5, 2012), in the English language, from users 
located in United States, and containing the following hashtags to ascertain the relevance of the tweet to 
the Presidential Election of 2012: #Election2012, #Obama, #Romney. Gephi was used to conduct the 
network analysis and calculate social network metrics: the Connected Components calculation identified 
unique Twitter communities within the large dataset, the Average Path Length calculation was used to 
determine the extent to which users tend to be connected with each other, the Average Degree calculation 
determined the number of interactions between users, the Eigenvector Centrality calculation determined 
whether users tended to interact with popular users, and the Average Clustering Coefficient calculation 
determined whether users preferred to interact with specific users. 


3 Results 


3.1 Study 1 


We examined the frequency of tweets posted daily leading up to the election to determine whether Twitter 
users perceived a sense of urgency and hence posted more. We found that Twitter activity steadily increased 
from October 31° to November 4". Furthermore, Twitter activity almost doubled from November 4" to 
November 5", the day before the election. We compared the tweet frequency of the swing states to the 
tweet frequency of the United States overall. We found that Twitter users located in the eight battleground 
states (out of the fifty states) generated 28% of the overall tweets, a significantly large proportion of Twitter 
communications (CNN Wire, 2012). Our findings support the association of greater posting frequency with 
greater political participation because the battleground states (i.e., Colorado, Florida, Iowa, New 
Hampshire, Nevada, Ohio, Virginia, and Wisconsin) had some of the highest voter turnouts (CNN Wire, 
2012). Study limitations included the limited analysis time frame. 
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Figure 1: The number of Twitter communications that took place six days prior to the 2012 presidential 
election. 
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Figure 2: The percentage of Twitter communication six days prior to the 2012 presidential election by 
state. 


3.2 Study 2 


In the field of social media and politics, there has not been a study that applies sentiment analysis to the 
study of political engagement. (e.g. Gayo-Avello, 2012, Choy et al., 2011). We examined the relationship of 
tweet sentiment expressed on Twitter and political engagement. Three types of sentiment were used in this 
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study through the usage of Affective Norms for English Words (ANEW) list: urgency (arousal), efficacy 
(dominance), and valence [Bradley and Lang, 1999]. Vis-à-vis the concept of the monitorial citizen, it was 
expected that tweets in the swing states would contain a greater urgency sentiment. From analyzing the 
standardized residuals of the sentiment found among all of the voting states, there were no patterns of 
statistically significant instances of any of the categories of urgency, valence, and efficacy, which occurred 
in the battleground states. However, many of the solid states (e.g. New York, California, Texas, Alabama, 
Louisiana), which also had some of the lowest voter turnout, lacked significant sentiment (CNN Wire, 2012; 
2012 Election Maps, 2012). Study limitations included the accuracy of the sentiment rating (e.g. literal 
scoring may not detect sarcasm in tweets) and the limited time frame examined. 
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Figure 3: The Valence Sentiment six days prior to the 2012 presidential election in the U.S. 
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Figure 4: The Arousal Sentiment six days prior to the 2012 presidential election in the U.S. 
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Figure 5: The Dominance Sentiment six days prior to the 2012 presidential election in the U.S. 
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3.3 Study 3 


A macro-level analysis involving Twitter users and their connections to other Twitter users was conducted 
via Gephi, a social network visualizer, to examine the presence of social capital in Twitter. In Twitter, 
connections within the network can occur via retweets of a user’s post, a user mention in a post, and a 
direct reply to a user. However, social network metrics indicated the lack of a strongly connected network. 
For instance, the average degree, which indicates the number of users other users were connected to, was 
0.207. Thus, on average, users interact (replied to, retweeted, or mentioned) with less than one user. 
Limitations included were the limited analysis time frame and lack of geographical tweet data to examine 


political engagement across states. 


3.4 Study 4 


We assessed political engagement and Twitter usage among college student populations from students of 
California, New Mexico, Pennsylvania, New York, and Alabama. With permission, we adapted Vitak and 
colleague’s (2011) survey on Facebook use and political engagement for this study. Of the 89 survey 
respondents, 39 respondents were Twitter users. Only Twitter users were assessed. Pearson's product- 
moment correlation tests were performed to test for significant correlations between voting in 2012 and the 
74 questions asked of the respondents. Self -reported voting behavior was significantly correlated with the 
following: an interest in politics (r =0.424, p = 0.007). Disagreement with the statements “I can learn a lot 
from people with backgrounds and experiences that are different from mine” (r = - 0.279, p = 0.085) and 
“T think it is important to get involved in improving my community (r = -0.346, p= 0.03) showed significant 
negative correlations with voting. Results indicated that interest, open-mindedness, and a belief in civil 
participation encouraged political engagement. No significant associations between Twitter use and offline 
political engagement were found. Study limitations include the low response rate and the survey’s restriction 
to the college student population. 


4 Conclusion 


Twitter’s impact on political engagement may depend on how often the tweets are generated and whether 
the tweets are be seen by open-minded, and politically and civically-interested users. Hence, the relationship 
between greater use of social media and greater political engagement may be that greater social media use 
allows the dissemination of information to people, who may be interested or alarmed by the information. 
Since the voter turnout of United States youth and minorities has remained historically high, it is possible 
that despite a lack of an expected connected social network and offline political engagement--which involves 
more time to cultivate-- social media reflects and rouses a short term fervor that often engages the monitorial 
citizen. Future work should examine the following aspects of activities across all major social media 
platforms during politically significant times: types of user-generated content (e.g. comments, web links, 
RTs), the demographics of social media users (e.g. location, ethnicity, age), sentiment in user-generated 
content, and social network characteristics of interactions. A more comprehensive view of social media’s 
role in political engagement can be gleaned from such studies. Altogether such studies can reveal crucial 
patterns that lead to the clearer understanding of social media's role in political engagement. 
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Abstract 

This study will: 1. Uncover the characteristics of the emerging 21st Century workforce reflected in the 
literature associated with an NSF data infrastructure project (DataOne). 2. Compare those characteristics 
to the roles and responsibilities of school librarians articulated by their professional associations. 3. 
Compare those characteristics to the roles and responsibilities of school librarians articulated by education 
guidelines generally. This study contributes to the body of knowledge on the role of school librarianship 
in STEM education, particularly project-based learning, as well as to the broader scientific apparatus. It 
provides insight about how developments in K-12 education relate to scientific data infrastructure 
development. As knowledge workers, school librarians can play pivotal roles in STEM teaching and 
learning in ways that connect educational developments with the expectations of a science-based 
workforce (Subramaniam, Ahn, Fleishmann, and Druin 2012). 
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1 Introduction 


This work examines the extent to which standards associated with the roles and responsibilities of school 
librarians and education in general align with characteristics of the workforce emerging in the 21st Century. 
This poster presents research that culls scholarly literature associated with NSF-funded DataONE to 
identify characteristics of this workforce. Publications in this literature are appropriate representatives of 
the workforce because of DataONE’s capacity to diffuse particular values and encourage particular practices 
related to data intensive, computational science. DataONE is both a sociocultural and technical 
infrastructure that enables science by allowing scientists to manage their data, share their data, and access 
other scientists’ data (Michener, et.al., 2012). 

This research begins the exploration of “new” roles for school librarians by comparing related 
professional standards to the characteristics of a future-ready workforce. Numerous practical and theoretical 
frameworks have identified K-12 educational institutions as critical feeders of human resources into the 
information, science, and technology-driven global economy. However, little is known about how the 
professional standards that guide K-12 school librarians align with these characteristics. Also, while school 
librarians’ roles and responsibilities in STEM education in general are well-articulated in standards for 
librarianship, this same clarity is not found among the general standards and guidelines for K-12 education 
(e.g., YALSA 2013; Neztgenscience.org). The roles played by school librarians’ counterparts in higher 
education who are involved in the DataONE Project, for example, suggest that school librarians can be 
particularly useful in teaching and mentoring data curation (Tenopir, Birch, and Allard 2012). Librarians’ 
filling these leadership roles would help meet the demand for scientific inquiry through project-based 
learning opportunities (Nextgensicence.org). 

Science and technology have been projected to be the engines that drive American innovation and 
productivity in the future (Carroll and Kandish 2010). Science is becoming increasingly computational and 
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data intensive (Horvitz and Willia 2009). Thus, the ability to interface with data at various points in the 
data life cycle is becoming a highly marketable skill (Michener, et.al.2012; Halbert 2013) that can be 
developed most effectively and efficiently by school librarians (Subramaniam, Ahn, Fleishmann, and Druin 
2012). The data skills that librarians can impart would leverage the value of public education. 
Simultaneously, it would leverage the value of publicly-funded scientific data as students use real world 
data to think about complex problems and as experiences with these data during their formative years 
contribute to long-term, marketable skills (Gavigan, 2012). 


1.1 Research Questions 


1. What is the nature of the 21st Century Workforce reflected in the literature associated with the 
NSF data infrastructure DataONE? 
1.1. Is this nature reflected in standards for school librarians? 

2. Are the K-12 standards associated with the roles and responsibilities of school librarians in the 21* 
Century represented in the DataONE literature? 


2 Methods 


The pilot study employed a content analysis research method using QDA Miner Software to analyze the 
abstracts and articles related to the NSF-DataNet:DataONE Project. QDA Miner is a qualitative tool that 
allows researchers to identify concepts and categories for analysis and generate relationships between terms 
(e.g., proximity and occurrence and clusters). 

Data Corpus. The data corpus at the time of this study consisted of 67 journal articles (December 
4, 2014) that were downloaded from “publications” on the DataONE.org website. Our pilot study examined 
and analyzed 25% of this corpus (additional data are being analyzed). 

Future work will involve refining the initial list of terms based upon the utility of these terms, 
based upon common terms identified during further analysis of the DataONE publications, and based upon 
terms extracted from standards of school librarianship. Also, we will map this list to relevant K-12 standards 
for school librarians. In addition, we will contextualize the results of this study using quotes related to these 
school and library standards and to scientific data infrastructure practices (Yang and Wildemuth 2009). 


3 Results and Analysis 


Terms extracted from the DataONE published literature on scientific data infrastructures were used for the 
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initial analysis. We identified the following list of terms : “data management,” “data curation,” “education,” 


“STEM,” “K-12,” “schools,” “media specialists,” “school librarians,” “resources,” “materials,” 
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“educational,” “tools,” “citizen science,” and “teachers,” among other terms. While infrastructure-oriented 
terms such as data curation and data management appear frequently in the results of the content analysis, 
terms related to K-12 education exist in small quantities. However, phrases such as citizen science, which 
provide conceptual links between scientific data infrastructures and K-12 education, appear frequently. 

Preliminary content analysis of the DataONE publications reveals the following top frequencies: 
“data” (9907 instances), “research” (2993), “science” (1875), “management” (1792), and “information” 
(1660). Two and three-word phrases appear in the following frequencies: “data management” (1164), 
“research data” (564), “data sharing” (496), “data sets” (288), and “research data management” (214). 
Education-related terms appear far down the list, in the single digits. 

We searched for these terms and phrases within documents that promulgate professional standards 
for school librarians (AASL 2010, AASL 2012a, AASL 2012b, YALSA 2013). “STEM” (182 instances, .7 
percent) is the most common term in these documents. Among the terms and phrases identified as the most 
frequent in the DataONE publications, only “research” (27, less than one percent) and variations of the 
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word “dataset” (6, percent not available — leftover term) appear among the school library/librarian 
standards. 


4 Conclusions 


This analysis resulted in a preliminary list of workforce characteristics as determined from the DataONE 
literature. These characteristics were mapped to relevant professional standards identified through 
professional associations related to school librarianship. Preliminarily, this mapping does not suggest an 
explicit integration between the characteristics of the emerging workforce and school librarians’ roles and 


responsibilities. 
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Prior to the establishment of the U.S. Department of Agriculture in 1862 farmers in the U.S. had myriad 
ways of sharing and communicating agricultural information that was rooted in experimental practice 
and based on years of experience. Farmers both needed and used that information — information they 
created, circulated, and consumed. The introduction of information work at the Department of 
Agriculture not only altered the kind and amount of information farmers had access to but effectively 
sought to redefine who the “experts” were through the production and dissemination of the results of 
applied scientific research conducted by scientists at the Department or work by others filtered through 
the institution. The vehicle for much of this information transfer was the annual reports of the 
Department. My research is an historical examination of the Department of Agriculture that looks 
specifically at its information functions from 1862-1888. Using the annual reports to identify and examine 
those functions, I situate that information work within the context of the emergence of the modern state 
and American empire, industrializing capitalism, and the history of information. 
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1 Introduction 


Information work at the newly formed United States Department of Agriculture relied heavily on surveys, 
developed and conducted by the department. These were sent to as many as 30,000 correspondents at a 
time, seeking data such as crop yields and soil conditions. Most of the informants were directly associated 
with agriculture throughout the United States and its territories. Many were farmers. In turn, the 
departmental research disseminated to farmers across the U.S. included information on new tools, more 
efficient farming practices, better seeds, new plants, and statistical information about crop yields, the 
weather, and market prices. The information work of the Department served as the foundation for what 
was, essentially, a national information policy. The vehicle for much of this information transfer was the 
annual reports of the Department of Agriculture. The annual reports were freely available to the public via 
the Department and the Congress, and mailed free of charge by the United States Postal Service (Fuller, 
1972; John, 1995). The annual reports were bound volumes ranging from 400 to 800 pages, with print runs 
that reached 400,000 by 1888. These unique, complex, and rich primary resources — government 
documents—are at the center of my research and are the primary source for my data. 


2 Historical Context 


The second half of the 19th century was a period of dramatic change in the United States. The federal 
government was growing and building an economic and political infrastructure that asserted a national 
identity. Although we often associate this period with the growth of cities, industrialization, and factory 
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work it is worth noting that, according to the U.S. census of 1840, more than half of the nation’s population 
was employed in agricultural work. The census of 1840 was the first federal census to include agricultural 
statistics. Compiled by the United States Department of State the census for 1840, 1850, and 1860 include 
free and slave states as well as territories and show that the population of the United States nearly doubled 
between 1840 and 1860. (http://www.agcensus.usda.gov/Publications/Historical_ Publications/index.asp) 
The growth in the number of farms in this period was also staggering: “Between 1860 and 1890 the number 
of farms in the United States nearly tripled. Land under the plow rose from 407 million to 828 million 
acres,..” and was directly tied to westward expansion by the U.S. government (Postel, 2007). There were 
1.5 million farms averaging 200 acres and 80 percent of exports were the products of agriculture (Rasmussen, 
1990). Immigration, the growth of urban areas, industrialization, western expansion, emerging 
transportation and communications networks, and the Civil War each had profound impact on agricultural 
production. Barron, in his study of the North after the U.S. Civil War, calls the period “the second 
transformation.” Influenced by the work of Alfred D. Chandler, Barron characterizes the period as one that 
witnesses the “rise of big business and the emergence of a new corporate mentality.” Also at play for Barron 
is the emergence of a class of bureaucrats and managers — the new “middle class.” Growth in consumer 
goods and the expansion of mass culture are critical elements, but perhaps most importantly for the purposes 
of this study, the period is one of significant growth of the power of the state (Barron, 1997). 

Many of the challenges associated with these changes centered on the need for greater crop yields, 
new types of crops, and techniques to ensure soil health. Though there had been support and agitation for 
a federal agricultural agency for decades it was not until 1862, in the midst of the Civil War, that the 
Congress of the United States established the Department of Agriculture. It was the first executive agency 
created in a period in which the federal government begins to assert an expansive and authoritative role. 
Examples of that expansiveness include a communications and transportation infrastructure spanning the 
continent and the groundwork for public universities that reframed education with a special eye to 
agriculture and applied science (Solberg, 1968; Williams, 1969; Hobsbawm, 1975; John, 1995). 

The Department was the first federal agency to engage in scientific research and the creation and 
dissemination of new knowledge on a massive scale, and it did so at a critical moment in the history of the 
United States between 1862 and 1888. The acquisition, production, and dissemination of information were 
at the heart of the work of the Department. That information and the systems developed to gather, produce, 
and share that information constitute the broader focus of my research. 


3 Research Questions and Conceptual Framework 


Broadly speaking, the purpose of this study is to see how systems of agricultural information at the 
Department of Agriculture drove change in the practice and economy of agriculture in the late 19th century 
U.S. and fed its emerging role as an imperial power (Williams, 1969). My research questions: 


1. What were the systems of information developed at the Department of Agriculture between 1862 
and 1888? How did those systems help facilitate the procurement, propagation, and diffusion of 
information by the Department of Agriculture? What were the types and ranges of that 
information? 

2. How did the information work at the Department of Agriculture help transform the political 
economy of agriculture in the U.S. in second half of the 19th century as the nation continued its 
westward expansion and began to assert itself as an imperial power? 


This study analyzes how the information work of the Department of Agriculture progressed from its 
instantiation as a federal agency in the midst of a civil war in 1862 to where it stood nearly 30 years later 
when the Department gained greater legitimacy, support, and a place in the President's cabinet. In many 
ways this study is built upon an anatomy of the annual reports that seeks to make explicit how they 
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embodied the systems of information work at the Department and how they served as vehicles for 
information transfer and scientific communication. The annual reports embody and articulate the work of 
the Department of Agriculture. 

This research is fundamentally historical and draws on both primary and secondary resources. 
Political economy of information forms the basis for my conceptual framework with its focus on power and 
how information is mobilized for control that asserts authority and neutralizes/tempers resistance (Mosco 
& Wasko, 1988; Schiller, 2007). Its explicit concern with questions that locate and analyze structures of 
power and those who hold, wield, and benefit from it adds depth to my work. 

I am also interested in understanding how farmers and others contributed to the production of 
agricultural information by the State and how they received and used that information produced by the 
Department of Agriculture. Was there resistance? Are there significant counter-narratives? Why do so many 
studies locate change in the 20th century and not the 19th? Why is change in agricultural practice seen as 
driven by technology and mechanization? What are we missing if we do not examine the relationship 
between the assertion of authority and power and the actual practice of farming? 

Historical research is rooted in the archives and this study is no exception. The archival records 
and manuscript material for the Department of Agriculture held at the National Archives and Records 
Administration at College Park, Maryland and the National Agricultural Library in Beltsville, Maryland, 
though scattered and not very deep for the period I am studying, have been critical to my work (Pinkett, 
1962). Examples of some of the surveys and correspondence about the exchange of plants and seeds provide 
concrete supporting evidence of claims made in the annual reports. In addition, the records at NARA and 
National Agricultural Library helped identify connecting threads in the agricultural press, agricultural and 
other professional societies, seed companies, and the archives of a selection of land-grant public universities 
in the United States. 

I survey selected secondary literature about the history of the Department and its information work 
and use it to situate my work in the larger discourse about the research and educational work of the 
Department of Agriculture and the relationship of that work to the emergence of the modern state and 
American empire. 

Literary scholar Oz Frankel argues that annual reports are critical expressions of state authority: 
“nineteenth-century government reports were packaged, disseminated, and even consumed as books and 
could be found in libraries or purchased in bookstores. In fact, the antebellum public sphere was cluttered 
with annual and special reports...”(Frankel, 2010). The annual reports of the Department of Agriculture 
between 1862 and 1888 serve as my most valuable primary resources for this study. Government documents 
like the annual reports and the reports of surveys and expeditions had multiple uses and tangled intentions. 
Their narratives are often layered, speaking to and serving multiple audiences. Annual reports are examples 
of the literature of organizations and they carry complex messages. 

The annual reports of the Department of Agriculture are a critical resource for research on the 
information functions of the Department, the priorities of the federal government, the institutionalization 
of scientific research and the centralized place for agricultural research in the federal government of the 
United States in the 19th century. Indeed, they have served as one of the primary resources for most 
histories of the Department due in large part to the meager archival record (Swank, 1872; Greathouse, 1898; 
True, 1912; Wanlass, 1920; Weist, 1926; Gaus & Wolcott, 1940; Ross, 1946; Rossiter, 1975; Dupree, 1980; 
Hamilton, 1990; Rasmussen & Baker, 1992; Carpenter, 2001). In the face of such a dramatic loss of 
documentation, the annual reports for this period serve a unique and valuable purpose. 


4 Preliminary Findings 


In the second half of the 19th century agriculture was essential to the nation's security and health. The 
Department of Agriculture was established by the Congress to fulfill responsibility to support and promote 
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progress in agriculture toward those ends. Indeed, the annual reports of the department are filled with 
rhetoric asserting its place as the bedrock of the economy, “Agriculture furnishes the food of the nation, the 
raw materials of manufactures, and the cargoes of domestic and foreign commerce. It is the cause and the 
evidence of true civilization; for, when tillage begins barbarism ends, and the various arts commence” 
(USDA, 1862). 

Information work conducted by the Department of Agriculture focused on applied science and 
agricultural statistics intended to bolster a nascent market economy with a global reach through increased 
efficiency in agricultural production. The Department generated, organized, and disseminated information 
that offered farmers information on new practices, new seeds and plants, and new tools that would result 
in increased crop yields and improve the efficiency of farm production. The seeds and plants were 
information, too. Bronwyn Parry (2004) argues, “seeds became useful proxies” for plants. And, unlike 
botanical illustrations or descriptions, contained within them the information necessary for reproduction. 
Kloppenburg (2004) and Schiller (2007) follow Parry and look at plant biotechnology and the 
commmodification of seeds and plant germ-plasm. 

Statistical information from the Department of Agriculture allowed farmers and other stakeholders 
in the economy of agricultural production and circulation to anticipate future crop yields and demand, and 
manipulate prices and distribution. Theodore Porter argues that the use of the term “statistics” is associated 
with a “great explosion of numbers” in the early 19th century. It had a dramatic impact on the organization 
and expression of knowledge by the expectations it “placed on people to classify things so that they could 
be counted and placed in an appropriate box on some official table, and more generally its impact on the 
character of the information people need to possess before they feel they understand something...” (Porter, 
1986). He distinguishes “political arithmetic” from statistics, which was tied to both scientific thought and 
philosophical theory interested in explaining natural and social phenomenon. The former, political 
arithmetic, was more directly used by or for the state's centralizing bureaucracy and its use of information 
to control. The latter, statistics, became associated with science and assertions about natural law locating 
truth in something outside the state. The statistical work of the Department of Agriculture grew during 
the same period that Porter suggests a transition from political arithmetic to statistical work concerned 
with variation, but not yet at the point of demonstrating causal relationships and measuring probability 
(Porter, 1986). 

The need for reliable and “accurate” information was emphasized by commissioners of the 
Department of Agriculture in its annual reports throughout the late 19th century. The accuracy of that 
information served to rationalize agricultural production in an emerging global market economy. The 
Department report for 1865 provides one example of the perceived power of its statistical information, 
“These estimates are ... published in the reports of this department, and by the information thus made 
public the commerce in farm stock and their products is regulated, and the farmer’s attention is timely 
directed to a decrease or over-production of any one of them. Heretofore an evil in our agriculture, was 
over-production, occasioned by a casual demand from abroad; but the tables of this department like the 
regulator of the steam-engine, will do much to prevent either a deficiency or its opposite.” This information 
also served to assert and then reinforce the authority of the federal government at a critical moment in its 
history and its role as a source of resources and information vital to agricultural production in that growing 
market economy. 

My study of the annual reports of the Department of Agriculture suggests that the systems of 
information that characterized its work were integral to the transformation of agricultural practice and 
knowledge in the United States. More specifically, the Department of Agriculture’s information work, which 
took a number of forms, defined, fed, and nurtured that transformation. It was a prodigious collector, 


producer, and distributor of information that served complicated and complex purposes. 
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Further, my analysis demonstrates how the Department of Agriculture was, perhaps more than any 
other federal agency, a place where we can see evidence of the emergence of a modern state and the exercise 
of a central state authority in the United States. Throughout the second half of the long 19th century the 
Department of Agriculture was an arena in which the role and authority of the federal government and 
that of the states was contested and negotiated. In its work we can also see instantiations of an emerging 
infrastructure of empire, with imperial aspirations made possible in large part by the agricultural 
information it collected, produced, and distributed. 


5 Contribution to Information Scholarship 


This work critically engages and contributes to a growing body of literature in the emerging field of 
information history (Black, 2006; Schiller, 2007; Weller, 2007). Evidence of information work at the 
Department of Agriculture from 1862 to 1888 challenges assumptions about the origins of the so-called 
information society and the place of information in the economy (Bell, 1973; Machlup, 1962; Porat, 1977). 
At the same time, it contributes to our understanding of the essential importance of systems of information 
to the functions of the state and the infrastructure the state develops to support its information work, assert 
its authority, and grow its power. My work enlarges and grounds our understanding of the state as an 
information machine in such a way that control and surveillance are no longer sufficient categories for 
understanding how information and the state develop and interact. 

Agricultural information work at the Department of Agriculture in the 19th century balanced 
economic viability and concepts of democratization and the public good. My work, by intensively examining 
this critical period in the development and maturation of state-sponsored information work, also helps to 
historically situate contemporary debates about the role of government in the development of cyber 
infrastructure and the value and responsibility of public funded research. 
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Abstract 

The workshop centers on a book, “The Discipline of Organizing” (MIT Press, 2013) that proposes a 
unified trans-disciplinary perspective on one of the core subjects for iSchools: information organization 
and retrieval. The printed book and several digital/ebook versions will be the focus of a case study in 
collaboratively developing and maintaining a teaching resource that supports cooperation across the 
different iSchools. The workshop will present the concept of the “Discipline of Organizing” (TDO), several 
examples of TDO in teaching, and invites participants to develop a vision for a collaborative teaching 


environment and process. 
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1 Introduction 


The TDO project was conceived in response to the question “what is an iSchool” and the corollary questions 
about what the iSchools should be teaching, i.e. a common core. The “Discipline of Organizing” proposes 
as a trans-disciplinary framework the concept of an Organizing System, an intentionally arranged collection 
of resources and the interactions they support. The book discusses the main functionalities common to all 
organizing systems, i.e. selecting resources, organizing them, designing resource-based interactions, and 
maintaining and adapting the resources and their organization over time. TDO draws on the concepts and 
case studies from many fields, most notably library and information science, computer science, informatics, 
cognitive science, law, economics, and business. 

Initiated at Berkeley, TDO ultimately involved faculty from 4 different iSchools, totaling 17 co- 
authors. TDO's concepts and terminology are intentionally interdisciplinary and abstract to demonstrate 
the applicability to many different domains. Endnotes categorized by discipline (i.e. LIS, law, cognitive 
science) augment the book to enable more intense engagement from different disciplinary perspectives. The 
book embodies numerous other innovations in book design and implementation, including interactive self- 
study quizzes, annotation capabilities, and user contributed content. TDO is already in use at more than a 
dozen schools in the fall of 2013. 


2 Purpose and Intended Audience 


The workshop proposes to bring together (a) people teaching from TDO, many who have not have met in 
person before, to discuss and learn from each other’s experiences and (b) people teaching classes in 
information organization and retrieval that would be interested in collaborating in this joint conceptual and 


educational endeavor. 
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Participants will discuss and exchange experiences in teaching core concepts of information 
organization and retrieval from different disciplinary perspectives and explore inter-school collaboration in 
a shared digital teaching environment. The main questions to be explored are: 


1. How can we create a “living” book that moves beyond the main text (while maintaining its integrity) 
by augmenting the content with student- and lecturer-generated disciplinary perspectives generating 
“versions” adapted to specific classes or schools? 


2. How can we develop a resource-sharing environment, which provides teaching tools (not only a 
“text” book”, but digitally enhanced learning and collaboration features) moving iSchool education 
into the digital world with all its capabilities? 


3 Goals or Outcomes 


The main goal of the workshop is to develop a shared vision on how to teach important core subjects. This 
includes bringing together people from various backgrounds to develop the emerging vision that this could 
evolve into a collaboratively maintained textbook and online course managed and delivered by the iSchools 
as a shared resource. 

Outcomes, case studies and sample teaching content will be provided on the TDO web portal 
(tdo.berkeley.edu). 


4 Proposed Format 


The workshop will include organizer presentations and invited short talks from TDO lecturers as well as 

discussion sections. 

Proposed schedule: 

0:00-0:30 Introduction of workshop attendees 

0:30-0:45 Introduction to TDO as a project 

0:45-1:10 Principles of TDO 

1:10-1:30 TDO's Design Innovations 

1:30-1:40 Break 

1:40-2:30 TDO in teaching 
> Using TDO as core text (TDO-centric syllabus) (10 mins., UC Berkeley) 
> Using TDO as core text in a particular track of a course (10 mins., UNC) 


> Using TDO as a supplemental text (augment disciplinary/specialized literature) (30 mins., 
Humboldt) 


We will discuss how the TDO concepts map to current LIS (and other disciplinary) curricula. 
2:30-2:45 Break 
2:45-3:30 Collaboratively maintaining a text book 
We will present 2 innovative teaching strategies already implemented: 
> Using TDO in MOOCs (UNC) 


> Using TDO for collaboration: the annotation experiment (UCB / UNC) 
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We will discuss how to develop and share resources in a joint digital environment: 
> for teaching (annotations, quizzes, assignments, exams) 


> for disciplinary perspectives (endnotes, annotations) 


5 Conclusion 

Information organization and information retrieval are core subjects every iSchool curriculum. TDO is an 
initiative to develop a common vocabulary across schools and disciplinary perspectives. The development 
of the book was already a collaborative process between different schools, and the workshop will encourage 
more collaboration, not only from a disciplinary, but also from an educational perspective. Both disciplinary 
and educational visions will be discussed and a new vision of publishing (paper and different ebook versions, 
enhancing with disciplinary perspectives) explored. 
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Abstract 

This workshop will provide a unique opportunity to consider how making and fixing, practices which 
frequently take place during the course of academic research, can provide unique, different and insightful 
research perspectives. To actively explore the connection between making and research, each participant 
will be asked to create a tangible artifact prior to the workshop which will serve as an embodiment of 
his/her research or some aspect of this research. The term tangible artifact is used broadly here and can 
include artifacts produced using various mediums. The guiding questions are: 1) How can the process of 
making challenge us to be more self-reflective and critical about the research we are conducting? 2) Can 
making add a dimension of tangibility to research that is distinct from other research activities? 3) How 
can reflecting on making and telling stories about the making process illuminate and stimulate learning 


and assist in research conceptualization? 
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1 Introduction 


The increasing digitization of our world, from data and knowledge to social interactions and relationships 
has prompted scholars in information and social sciences to consider the significance of materiality through 
making and fixing. These questions have been explored in various socio-technical contexts, from book 
restoration to 3D printing. In these activities, reflection is often essential, as it allows the maker to pivot 
and maneuver around challenges, to creatively and critically design, develop and invent. Reflecting on the 
act of making, both during and after the making process, can help uncover and make salient the invisible 
traces of making, and in the process provide the opportunity to stimulate reflection on the relationship 
between maker and material results. 

Making and fixing are practices that, at their core, center on the relationships among information, 
people and technology. Previous research conducted by several of the hosts of this workshop examines how 
hands-on production and/or repair can enable one to explore relationships between digital technologies and 
society in a deep and reflective manner.'? Additional research has focused on the intersection of technical 
skills and materiality.’ 


1 Jackson, S.J, Pompe, A., Krieshok, G. (2012) “Repair Worlds: Maintenance, Repair, and ICT for Development in Rural Namibia,” 
in Proceedings of the Computer-Supported Cooperative Work (CSCW) Conference, Seattle, Washington, Feb 11-15, 2012 

? Ratto, M. (2011). “Critical Making: conceptual and material studies in technology and social life”, The Information Society 27(4). 

3 Rosner, D. K. (2012). The Material Practices of Collaboration. Proceedings of the Computer-Supported Cooperative Work (CSCW) 
Conference, Seattle, Washington, Feb 11-15, 2012 
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Yet, scholars rarely use these practices as a self-reflective lens through which to further develop 
their own research. Ian Bogost has used the term “carpentry” to describe the philosophical practice of 
craftwork.* Bogost contends that philosophers (and academics for that matter) tend to relegate productivity 
to the act of writing, which is only one, less accessible activity. To remedy this issue, he calls for philosophers 
from various fields to engage in the act of doing, by making things. The reason being that philosophy should 
serve the world and inform what he calls “the carpentry of things”- the way things mold each other and 
the broader world.’ 

Taking inspiration from this call to action, this workshop will provide a unique opportunity to 
consider how making and fixing, practices which frequently take place during the course of academic 
research, can provide unique, different and insightful research perspectives. To actively explore the 
connection between making and research, each participant will be asked to create a tangible artifact prior 
to the workshop which will serve as an embodiment of his/her research or some aspect of this research. The 
term tangible artifact is used broadly here and can include artifacts produced using mediums such as 
photography, sewing, woodworking or other related approaches. 

The following questions will be used to broadly frame our discussions about the significance of 
making in relation to academic research and the construction of research narratives: 


1. How can the process of making challenge us to be more self-reflective and critical about the research 
we are conducting? 

2. Can making add a dimension of tangibility to research that is distinct from other research activities? 

3. How can reflecting on making and telling stories about the making process illuminate and stimulate 
learning and assist in research conceptualization? 


The intended audience for this workshop will be scholars who are interested in exploring how their making 
skills can help deepen their engagement with their research as well as those scholars interested in exploring 
materiality in their work. 

In accordance with this year’s iConference theme, “Breaking Down Walls: Culture-Context- 
Computing,” this workshop will explore how the unconventional lens of making can inform the narrative 
crafted by academic researchers. 


2 Participants 


Interested participants should submit a 500-word position statement that includes a brief description of the 
individual’s area of research and addresses the following question: How do you currently practice making 
or fixing in your research? Scholars interested in participating should submit their position statements to 
klhassma@syr.edu by Feb. 17, 2014. Participants will also be asked to create or bring some type of existing 
tangible artifact that embodies their research to the iConference. 


3 Tentative Workshop Agenda 


9:30-10:00 Welcome & Introductions 

10:00-11:00 Artifact Sharing Session in Small Groups 
11:00-11:15 Break 

11:15-12:15 Large Group Discussion of Artifact Sharing 


1 Bogost, I. (2012). Alien Phenomenology, or What It’s Like to be a Thing. Minneapolis: University of Minnesota Press 
5 Bogost, I. (2012, May 5). The Aesthetics of Philosophical Carpentry [Blog Post]. 
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12:15-1:00 Lunch 
1:00-2:30 Interactive Discussion w/ Open Design City and Fab Lab Berlin 
2:30-4:00 ‘Zine Making Session 
4:00-5:30 Visit/Tour of Open Design City 
The morning half of the workshop will be dedicated to sharing and discussion, built around the tangible 


artifacts that participants bring to the workshop. In small groups, each individual will share the process of 
making their artifact and how it serves as an embodiment of research, which will be followed by discussion 
to highlight common themes and identify fertile differences. During the last half hour of the morning sharing 
session, each small group will also be asked to share their insights with the larger group. We envision small 
groups to consist of 4-5 people. 

In the afternoon, we will begin with a hands-on group activity, developed in concert with our 
panelists, which will be followed by an interactive discussion. The discussion will feature speakers from the 
Berlin makerspaces, Open Design City and Fab Lab Berlin. The workshop will end with a field trip to Open 
Design City (ODC), where session participants will have the chance to experience a physical space created 
specifically for making. Open Design City is within walking distance of the conference venue, which will 
make it logistically easy for participants to travel there. 
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Abstract 

This workshop brings together researchers from different streams and communities that deal with 
information access in the widest sense. The general goal is to foster collaboration between the different 
communities and to showcase research that sits at the border between different areas of research. 
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1 Motivation 


Our research communities are remarkably scattered. For an outsider it must seem obvious that information 
science (IS), information retrieval (IR), human-computer interaction (HCI) and natural language processing 
(NLP/HLT) go hand in hand. However, there is surprisingly little overlap between these communities, 
perhaps best illustrated by conducting a simple citation analysis of the papers published at the top annual 
conferences in each area which reveals that there is little cross-disciplinarity. Going deeper into the research 
conducted in each discipline we find that even the basic assumptions to access and utilise information vary 
from one field to another, e.g. while researchers in IR tend to start with the “bag-of-words” assumption, a 
researcher in NLP would never dare doing something like this; while information scientists often face 
structured documents that need to be accessed (e.g. digital libraries), such structures must first be acquired 
from a database of images created in a lifelogging scenario before any access is possible, and so on. 

Users have started to become centre-stage of information access research even within the IR community 
(as illustrated by a substantial number of relevant papers presented at SIGIR, 2013) but there is still a long 
way to go to identify and employ information systems that incorporate both state-of-the-art methods for 
information access, search, navigation as well as human computer interaction and user experience (one just 
needs to pick a few randomly selected university library catalogues as evidence). The reason we identified the 
iConference as the best place to organise the workshop is that the urge to integrate the user in the information 
access process is deeply integrated in the research conducted by some of the best known iSchool research 
groups, e.g. the idea of human-computer information retrieval developed by Gary Marchionini (UNC) and 
human-centered information retrieval identified by Nick Belkin (Rutgers). Some of these ideas have sparked a 
lot of interest in working at the interface between different disciplines and this has also been demonstrated by 
newly established conferences such as IiiX (Information Interaction in Context) and affected some of the 
primarily technical evaluation efforts in the IR community such as the Text Retrieval Conference (TREC) 
series, in particular the Interactive track and the Session track. Nevertheless, the majority of the researchers 
in the different fields remain ignorant of what is going on outside their main topics of interest and that is 
partly because there is no appropriate forum to bring these ideas together and discuss them. Ultimately the 
e a forum where researchers from different communities feel at home and exchange ideas for future research 


directions. 
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The workshop presents a mix of keynotes, submitted research papers/posters/demos and a panel 
discussion. We are very pleased to have Nick Belkin (Rutgers) as a keynote speaker at the workshop. 

We issued a Call for Papers asking for submissions of position papers as well as novel research 
papers and posters/demos addressing problems at the interface of IS, IR, HCI and NLP listing these topics 


as a general guideline: 
e Interactive IR 
e Adaptive IR 
e Recommender Systems 
e Novel methods to access to digital libraries 
e User studies 
e User/group profiling 
e = Lifelogging 
e Multimedia information access 


All papers have been peer-reviewed by the programme committee consisting of experts drawn from the 
different communities guaranteeing a mix of industrial and academic backgrounds. 


2 Programme Committee 
e Leif Azzopardi, University of Glasgow (United Kingdom) 
e Paul Clough, University of Sheffield (United Kingdom) 
e Martin Halvey, Glasgow Caledonian University (United Kingdom) 
e Hideo Joho, University of Tsukuba (Japan) 
e Evangelos Kanoulas, Google (Switzerland) 
e Jussi Karlgren, Gavagai (Sweden) 
e Birger Larsen, Aalborg University (Denmark) 
e Jochen Leidner, Thomson Reuters (United Kingdom) 
e Gary Marchionini, University of North Carolina at Chapel Hill (USA) 
e Doug Oard, University of Maryland (USA) 
e Alan Said, CWI (The Netherlands) 
e Klaus Schoeffmann, Klagenfurt University (Austria) 
e Pavel Serdyukov, Yandex (Russia) 
e Jialie Shen, Singapore Management University (Singapore) 
e Ryen White, Microsoft Research (USA) 


e Max Wilson, University of Nottingham (United Kingdom) 


3 Workshop Site 
More details about the workshop can be found at: http://mindthegap2014.dai-labor.de/ 


1174 


Digital Youth: Towards a New Multidisciplinary Research Network 


Beth Juncker'!, Gitte Balling', Marianne Martens’, Theresa Anderson’, Eliza T. Dresang’, 


Karen E. Fisher* and Katie Davis* 
1 University of Copenhagen 

? Kent State University 

3 University of Technology, Sydney 

t University of Washington 


Abstract 

The workshop will discuss and reinterpret our collective understanding of the information-technology- 
people triad and accompanying concepts in order both to broaden and to sharpen the focus on Digital 
Youth. The workshop wants to break down walls, to cross disciplinary borders, and to establish dialogues 
among researchers across continents and LIS traditions in order to contribute to the development of LIS 


research communities. The goals of the workshop are: 


e to examine how Digital Youth can function as an overall research frame. 

e to establish dialogue and cooperation between and across disciplines and perspectives 

e to define the field so as to remain open to broader theoretical and methodological perspectives. 
e to provide a statement of purpose inviting other researchers to join the research initiative. 
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Abstract 

This full-day workshop examines conceptual and practical aspects of collections and the context they 
provide in the digital environment, especially in large-scale cultural heritage aggregations. Collections 
will be considered in relation to the information needs of scholars, roles of cultural institutions, and 
international interoperability. The workshop aims to: 1) Broaden the conversation across an international 
community, 2) Further the research and development agenda for digital aggregations, 3) Relate 
conceptual advances to implementation goals and 4) Identify realistic approaches for collection 
representation, contextualization, and interoperability at scale 
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1 Description 


Trends in interoperable content and open data raise important questions on how to represent complex 
objects, curated and dynamic collections, and context in ways that benefit users and collecting institutions. 
This workshop will provide a forum for international engagement on this important topic and provide 
iSchools the opportunity to build a community around our strengths in this important research area. 
Sessions will be led by European and North American experts from iSchools and projects developing large- 
scale digital cultural heritage collections. 


e Morning session: Conceptual Foundations of Digital Collections 
o Carole L. Palmer & Karen Wickett (CIRSS, University of Illinois) 
o Hur-li Lee (School of Information Studies, University of Wisconsin-Milwaukee) 


o Martin Doerr (Institute of Computer Science, Foundation for Research and Technology — 
Hellas) 


o Carlo Meghini (Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle 
Ricerche). 


e Afternoon session: Practical Implications for Building Digital Collections 
o Antoine Isaac, Europeana 
o Emily Gore and Amy Rudersdorf (Digital Public Library of America) 
o Sheila Anderson (Centre for e-Research, King’s College London) 
o Shenghui Wang (OCLC Research) 
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o Mark Stevenson and Paul Clough (Department of Computer Science, University of 
Sheffield) 


We encourage participation from: 


e Faculty and students from iSchools involved in research and education in information organization, 


cultural heritage, digital collections and archives, and metadata. 


e System designers and developers interested in the creation of metadata schemas and promoting 
interoperable digital cultural heritage content. 


2 Background Reading 
Modeling Cultural Collections for Digital Aggregation and Exchange Environments, a whitepaper developed 
by researchers from the Europeana Foundation and CIRSS that discusses functions of collections in cultural 
heritage aggregations and proposes a formal extension to the Europeana Data Model to explicitly 
accommodate representation of collections and collection/item relationships. A public release of the paper 
is available at http: //hdl.handle.net/2142/45860. 

Position papers by session contributors and additional workshop details will be posted as they 
become available at http://bit.ly /collectionsworkshop2014. 
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Abstract 

iSchools are home to many researchers who study the practices of scholarly publishing, a significant 
number of whom also design, build, and evaluate innovative tools for scholarly publishing. Yet when it 
comes to our own work, many scholars at iSchools have been reluctant to try new modes of publishing, 
a trend this workshop aims to correct. Many researchers maintain an appropriate skepticism of the claims 
made by evangelists of the new modes of publishing; thus, this workshop starts with an understanding 
that current publishing paradigms are embedded in socio-material practices and institutional logics. 
Nevertheless, a growing number of scholars at iSchools, and iSchool-related programs are experimenting 
with new ways of publishing their own work. This workshop will bring together researchers of scholarly 
communication practice and designers of research publishing tools with those who are considering their 
own scholarly publishing projects. The aim is to encourage scholars considering experiments through an 
exploration of both the tools now available for experimentation, and expert insights into innovations in 


scholarly communication. 

Keywords: scholarly communication, publishing, new media 

Citation: Finn, M., Shaw, R., & Walker, S. (2014). Changing Publishing Practices in iSchools. In iConference 2014 Proceedings 
(p. 1178-1179). doi:10.9776/14227 

Copyright: Copyright is held by the authors. 

Contact: megfinn@gmail.com, ryanshaw@unc.edu, stw3@uw.edu 


1 Goal 


The ultimate goal of this workshop is to encourage iSchools to be leaders in practicing as well as analyzing 
and building the future of scholarly publishing. We will work toward this goal by bringing together scholarly 
communication researchers and designers with those who are either already engaged in innovative publishing 
projects or who would like to be. We envision three ways in which this workshop will benefit participants. 
First, participants will walk away with: (a) ideas for new experiments in publishing their own work; (b) 
specific plans and resources for executing these experiments; and (c) a network of people to draw upon for 
support. Second, designers and builders of publishing tools will meet and learn from potential users of those 
tools. Finally, researchers of scholarly publishing will find opportunities for applying their expertise close to 
home as well as potential new sites of study. With these goals in mind, we plan to host a follow up workshop 
in 2015 to see what progress has been made. 


2 Participants 


There are three interrelated groups of people that this workshop will include: (1) researchers, or those 
studying scholarly publishing practice; (2) designers, or people who have conceived, designed and built tools 
or platforms to assist in scholarly publishing; (3) and authors, scholars who have, or are considering 
publishing research in non-traditional ways. Of course many participants may identify with more than one 
group. By including practitioners who are innovating on scholarly publishing and researchers interested in 
scholarly publishing, we hope to bring people with expertise in different areas into conversation with each 
other. Furthermore, we hope that by inviting people who have works in progress or ideas about how to 
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publish their research, we might give them the tools or support they need to complete their project. We do 
not know how many people this workshop would draw, but we would hope to have around 30 participants. 


3 Format 


We are soliciting applications for participation consisting of a short abstract describing the applicant’s 
research, designs or ideas as well as a bio. From these applications we will select the participants in the 
workshop, attempting to be as inclusive as possible while maintaining a good balance among researchers, 
designers, and authors. The workshop will be divided into three sessions: presentations, “getting to know 
you,” and design of research or publishing. 

During the first session, all the workshop participants will be expected to present something: either 
their research about work that is already being done, examples of innovative publishing, or ideas for what 
they would like to do in the future. Participants who have done research on creative scholarly publishing 
— historical or present day — will be invited to give short presentations. Participants who have designed 
or built novel publishing tools will be asked to give demonstrations. Finally, participants with "work-in- 
progress" or "ideas" will be asked to present their idea during a "lightning round." 

In the second session, participants will be paired up on "dates" during which they can follow up on 
the previous presentations or look for other areas of overlap. This session will coincide with lunch. 

In the third session, participants form groups and spend two hours conceptualizing and presenting 
specific plans for experiments in publishing group members’ scholarship. Each group will focus on a 
developing an action plan for the "authors" in the group. 
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Abstract 

Interdisciplinarity is in the DNA of the iSchools. This workshop invites you to discuss how inter- 
disciplinarity plays out in theory and practice. The workshop addresses the uniqueness of the iSchools, 
provides an interactive framework to discuss and reflect on interdisciplinary practice. It suggests some 
models and tools to describe relations between disciplines, while offering a venue to brainstorm and 
envision issues of interest with like-minded colleagues. The purpose of this workshop is to establish a 
setting for continuous dialogue among colleagues on how interdisciplinarity plays out in practice. The 
workshop aims to create a forum for reflection on local inter-disciplinary practice(s) and to consider the 
possibilities of forming research networks. The workshop opens with a panel presentation from iSchool 
deans and senior faculty discussing current interdisciplinarity practices in iSchools and with presentations 
that address theoretical frameworks of interdisciplinarity. These presentations will form the basis for 


small group discussions in the afternoon. 
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1 Introduction 


Interdisciplinarity is in the DNA of the iSchools. This workshop invites you to discuss how interdisciplinarity 
plays out in theory and practice. The workshop addresses the uniqueness of the iSchools, provides an 
interacting framework to discuss and reflect on interdisciplinary practice. It suggests some models and tools 
to describe relations between disciplines, while offering a venue to brainstorm and envision issues of interest 
with like-minded colleagues. 

Please visit the workshop website at http://interdisciplinarity.cci.fsu.edu/. 


2 Organization 


2.1 Organizers 


e Dorte Madsen,Copenhagen Business School, dma.ikl@cbs.dk 
e Shuyuan Mary Ho, Florida State University iSchool, smho@fsu.edu 


2.2 Panelists 


e Elizabeth Liddy, Dean, Trustee Professor, School of Information Studies (iSchool), Syracuse 
University 

e Mike Eisenberg, Dean Emeritus, Professor, School of Information (iSchool), University of 
Washington 

e Kathleen Burnett, Chair, Professor, School of Library and Information Studies (iSchool), College 
of Communication and Information, Florida State University 
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e Steve Sawyer, Professor, PhD Program Director, School of Information Studies (iSchool), Syracuse 
University 

e Harry Bruce, Dean and Professor, The UW Information School 

e David Fenske, Dean, College of Information Science and Technology, Drexel University 


2.3 Presenters 


e = §=Interdisciplinary theory on relations between disciplines - a theoretical framework for discussing 
multi- inter- and transdisciplinarity: Dorte Madsen, Associate Professor, Copenhagen Business 
School 

e Measuring interdisciplinarity: Staša Milojević, Assistant Professor, School of Informatics and 
Computing (iSchool), Indiana University 

e Challenges in Interdisciplinary Communication: John M. Budd, Professor, School of Information 
Science & Learning Technologies (SISLT), University of Missouri 


3 Overview 


As described in the iSchools’ vision', an iSchool provides the venue that enables scholars from a variety of 
contributing disciplines to leverage their individual insights, perspectives, and interests, informed by a rich, 
“trans-disciplinary” community — and iSchools foster the development of an intellectual space where true 
interdisciplinarity plays out. 

In this workshop we invite participants to discuss and refresh our visions and practice of 
interdisciplinarity. In particular, we hope to identify ways the contributing disciplines are informed by the 
transdisciplinary community. Moreover, we are interested in discovering the relationships between 
information, technology, policy, people and society as translated in daily practice through: 


e research projects 
e curriculum development 
e teaching 


We believe that the specific disciplines - and combinations of disciplines — will emerge to connect and 
represent these relationships across information, technology and people (Madsen, 2013). 

Following interdisciplinary identification, we would further discuss and discover new patterns of 
interaction — between people and/or between disciplines. For example, how are collaboration bridges built 
between disciplines? Which are the buildings blocks of these bridges? (e.g. concepts, theories, methods, or 
interdisciplinary research groups / faculty?). What types of problems have iSchool researchers encountered? 
Are they 


questions identified within a single discipline? 
questions found in the intersections of disciplines? 
questions found in the gaps between disciplines? 
questions that cross disciplines? Or, 


Ot oS: 


issues and questions without a compelling disciplinary basis? (Lattuca, 2003). 


The workshop opens with a panel presentation from iSchool deans and senior faculty discussing the current 
interdisciplinary practices in iSchools and addresses frameworks of multi-, inter- and transdisciplinarity. 
These presentations will form the basis for small group discussions in the afternoon. 


1 http://test.ischoolsorg.syr.edu/about /history /vision/ 
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3.1 Purpose 


With this initiative, our short-term goal is to invite colleagues to discuss how interdisciplinarity plays out 
in practice, to create a forum for reflection on local interdisciplinary practice(s) and to consider the 
possibilities of forming effective research networks. 

Moreover, the long-term goal of this workshop will foster the development of connecting 
interdisciplinary practices with interdisciplinary theory, which is the theory development of the Information 
Field. This goal requires the development of a research network where 


e = reflections on disciplinary and interdisciplinary practices come together 

e = relationships between information, technology, and people are analyzed and mapped 

e contributing disciplines — and relations between disciplines - are mapped. Such mapping is intended 
to serve as a point of departure for analyzing multi- inter- and transdisciplinarity in the Information 
Field. 


3.2 Intended Audience 
iSchool faculty, researchers, administrators, Deans, Associate Deans, and stakeholders. 

Among strategies to engage attendees is a pre-survey analysis; composing groups with similar or 
different interests; focusing on attendees own practice(s). 


3.3. Goals or Outcomes 

The workshop addresses the uniqueness of the iSchools, provides an interacting framework to discuss and 
reflect on interdisciplinary practice, and offers some models and tools to describe relations between 
disciplines. Furthermore, the workshop offers a venue to discuss issues of interest with like-minded 
colleagues. 


3.4 Relevance to the Conference 


The workshop intends to create common ground and a common working language for addressing issues of 
multi- inter- and transdisciplinarity. Long-term relevance might be identity building and theory building of 
the Information Field. 


4 Draft Agenda 


Duration Topic 
9.30-9.45 Informal self-introductions over coffee 
9.45-10.00 Brief introduction to agenda, co-organizers 


The goal of the workshop — presented by Madsen. 


peers Presentations (Madsen) and panel discussions, cf. above. 
11.15-11.30 Coffee break 

11.30-1.00 Panel discussions ctd. 

1.00-1.30 Group lunch 

1.30-2.15 General warm-up discussion 


Break-out group discussion prompts: 
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e What do you see as your home discipline? 

e How is your home discipline informed by other disciplines? 

e How do you understand interdisciplinarity? 

e What does disciplinarity and/or interdisciplinarity mean to you in your daily 
practice? 


Presentations (Budd): Challenges in interdisciplinary communication 


Discussion of specific interdisciplinary practices within pre-determined groups. 
Groups sit together at tables, organized according to pre-survey analysis. 


Discussions on the basis of input from the morning sessions and from essays submitted 
prior to the workshop. 
e Which challenges are you facing in your work in connection with 
interdisciplinarity? 
9 15-3.30 o in research projects, (e.g., research methods, theories, etc.) 
o in curriculum development, 
o in teaching. 
e In research projects, how would you characterize the types of research 
questions you pursue in terms of a disciplinary and/or interdisciplinary basis? 
e Case studies: How do you experience cooperation with people from disciplines 
other than your own? Do you have and/or develop a common working 
language and/or a shared conceptual framework? to which extent is it 
required to make explicit e.g. the basic assumptions of each discipline? 
Half of the groups will present summaries (organizers will offer structure for the 
presentations) 


3.30-3.45 Coffee break 


Presentations (Milojević): Measuring Interdisciplinarity 


Continued .... Discussion of specific interdisciplinary practices within pre-determined 
groups 


3.45-5.00 Second half of the groups present discussion summaries (organizers will structure this) 
Synthesis by organizers. 
e What did we learn that might shed light on the participants’ own 
interdisciplinary practices and which shared interests may be identified? 
e Can opportunities for future collaborations be identified? 
e Research network(s)? 


Final wrap-up and sharing time 
5 .00-5.30 e Collect feedback from the participants on the day’s events, debrief with 
` f attendees. 


e Brainstorm ideas for future events 


Please visit http://interdisciplinarity.cci.fsu.edu/ for any updates. 
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5 Building an iCommunity 
The workshop has its own website http://interdisciplinarity.cci.fsu.edu/ that includes: 


5.1 Pre-Workshop Participants Survey 


All participants are asked to fill out a questionnaire that will be used to understand the potential 
participants’ disciplinary and/or interdisciplinary backgrounds, the domain area they are teaching in, the 
curriculum they develop (individual vs. collaborative in nature), etc. We also invite short essays submitted 
by each participant describing his or her personal view of interdisciplinary practice. With their consent, we 
will make these submissions available to the public. 


5.2 Website for Interdisciplinarity in iSchools 
To continue the discussions from the workshop, and to provide a forum for reflections on interdisciplinary 
practices and shared insights, resources and cases, we have created a website that is intended to serve as a 


meeting point beyond the workshop. 
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Abstract 

The annual Consortium for the Science of Sociotechnical Systems (CSST) workshop at the iConference 
perpetuates a tradition of providing sociotechnical scholars with a place to surface areas and domains 
ripe for new or renewed attention, highlight synergies that have gone unidentified previously, and 
establish new relationships. This year’s workshop, “Breaking Down and Building Up: Accelerating 
Sociotech Scholarship in the iSchool Community,” will pivot around the dual orientation of community 
building and scholarly action; the full day agenda will combine a morning of introductory talks and 
discussion with an afternoon of hands-on feedback sessions built around project ideas and paper drafts. 
We are particularly keen this year to bring together scholars from a wide variety of disciplines, 
nationalities, and histories so that our work together can itself break down barriers between ideas, schools, 
countries and perhaps continents to establish new mechanisms and pathways for integrated sociotechnical 
scholarship. 
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1 Introduction 


Since 2008, the Consortium for the Science of Sociotechnical Systems (CSST) has provided a place for 
sociotech scholars at the iConference to surface areas and domains ripe for new or renewed attention, 
highlight synergies among scholars that have gone unidentified previously, and establish new relationships. 
Continuing in this spirit, yet attempting at the same time to reach beyond convention, we propose a full- 
day workshop at the 2014 iConference that will have the dual orientation of community building coupled 
with moving interested and active sociotech scholars into concrete scholarly action. We are particularly 
keen this year to bring together scholars from a wide variety of disciplines, nationalities, and histories so 
that our work together can itself break down barriers between ideas, schools, countries and perhaps 
continents to establish new mechanisms and pathways for integrated sociotechnical scholarship. 

In addition to a brief introduction to the sociotechnical approach within the larger iSchool 
community of scholars, the morning half of the workshop will showcase a series of rapid talks from 
established sociotech researchers on a broad range of pragmatic topics, i.e., sociotech approach and tenure; 
sociotech approach and journal publishing; sociotech approach and teaching, socitech approach and 
methods, etc. Confirmed speakers include Payal Arora, Erasmus University Rotterdam; Greg Downey, 
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University of Wisconsin-Madison; Kristin Eschenfelder, University of Wisconsin-Madison; Sean Goggins, 
University of Missouri; David Ribes, Georgetown University; Steve Sawyer, University of Wisconsin- 
Madison; Kalpana Shankar, University College Dublin. 

The afternoon will be directed to two hands-on feedback sessions in which workshop participants 
can seek guidance, critique, counsel or any other form of constructive input within a small group setting. 
In advance of the workshop, each participant will be asked to provide an abstract or précis of a project or 
paper so that members of the review team (comprising both senior scholars and peers) can prepare to 
provide dedicated feedback. Feedback sessions will be grounded in an ethos of mentoring, but with the 
added benefit of creating intellectual spillover as participants share their work not only for expert critique, 
but also peer feedback. The workshop will end with a time for synthesis, in which senior scholars will 
identify emergent themes and areas for research based on their involvement in the mentoring sessions. 

We expect a variety of outcomes from this workshop ranging from individual project guidance to 
strengthened international ties among CSST enthusiasts. More specifically, we will also write a blog post 
for the CSST website (www.sociotech.net) detailing the projects presented at the workshop to enable others 
who were unable to attend to approach authors with questions or other forms of feedback. 

Interested participants should submit a 750-1000 word abstract or annotated outline for discussion 
by February 3, 2014. Members of each mentoring group will receive one another’s materials for pre-workshop 
review by February 17, 2014. With regard to the number of attendees, we welcome anyone interested to 
attend the workshop as an observer, but can only accommodate 36 people as part of the mentoring sessions. 
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Abstract 

Genre theory has in recent years entered the study of information (Andersen, 2008). The question remains 
whether we in the information field have made our own independent contributions to genre theory, not 
only ‘applying’ genre theory. From 6 different positions, this workshop will discuss the role and potential 
of genre studies in the study of information and what the study of information can contribute with to 
the study of genre. Coming from universities in North America and Denmark, the panelists will each 
present a perspective or argument. Each perspective or argument is developed around the question what 
the study of information can contribute with to the study of genre. 
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1 Introduction 


Genre theory has in recent years entered the study of information (Andersen, 2008). The question remains 
whether we in the information field have made our own independent contributions to genre theory, not only 
‘applying’ genre theory. From 6 different positions, this workshop will discuss the role and potential of genre 
studies in the study of information and what the study of information can contribute with to the study of 
genre. Coming from universities in North America and Denmark, the panelists will each present a 
perspective or argument. Each perspective or argument is developed around the question what the study of 
information can contribute with to the study of genre. In no particular order, the 6 talks will be: 


1. This talk will explore why RGS is powerful and what it might mean to the study of information. 
But what is it that makes RGS such a powerful approach to the study of genre and genred 
communication and how can the study of information contribute or challenge RGS with new or 
different insights? Where do RGS and the study of information have common concerns and where 
are they different? 

2. This talk will offer insight into the locally situated ways in which information creators, seekers, and 
providers negotiate what counts as "information" in given contexts, and how generic forms are taken 
up as informative (or not). Examples of written genres of "keeping track" in everyday life" and oral 
genres of information provision in a clinical institutional setting will be provided. 

3. Rhetorical genre studies (RGS) has not adequately addressed the ideology of genres — the values 
and power relationships they embody and perpetuate and the forms of knowledge they enable and 
constrain — This limitation constitutes a weakness in the theoretical framework of RGS. This talk 
will consider how empirical and historical studies exploring the ideology of archival genres have the 
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potential to strengthen the theoretical framework of RGS and to deepen and extend our 
understanding of the historical evolution of genres. 

4. A generation after the advent of new media, the relationship between digital forms of information 
and the conventional genres of print still remains unclear. This continued unsettledness has 
prompted a reconsideration of traditional genres of information as well as the use of genre theory 
itself. This talk will explore digital books through the lens of different genres in order to come to 
grips with some of the complicated practices of meaning-making in the 21st century. 

5. This talk will show the benefits that rhetorical genre studies may derive from an alliance with the 
archival discipline. In particular, the concern will be the method of inquiry involved in diplomatics 
as a rigorous way to analyze documentary forms and business processes (or actions), which are 
central aspects of genre theory. The archival understanding of ‘intertextuality’ will be discussed 
with the aim of providing genre scholars with new insights into the notion of genre system and the 
relationship between organizational genres of communication and evidence. 

6. Information history and genre theory share the understanding of seeing information as situated in 
specific contexts and being rhetorically framed. Together they challenge the current mainstream 
notion of information as neutral, objective, transcendent. Though my studies of the conceptions of 
information in late 18th century Denmark point to the situatedness of information, other parts of 
my research point to information as being able to transcend contexts and thus genres. From the 
standpoint in genre theory, Bazerman claims that in particular the database blurs and destroys the 
chains between the information context and its appearance on the screen (Bazerman, 2012). How 
is this to affect genre theory? 


2 Conclusion 


These perspectives contribute not only to discuss and provide critical insights regarding the study of 
information from the position of rhetorical genre theory. They furthermore encourage and increase 
theoretical reflections within information studies. 


3 References 
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Abstract 

This full day workshop builds a new community of scholars interested in exploring the potential of the 
“Social Studies of Information” (SSI) as a meta-identity for information research informed by the 
humanities and social sciences. We are inspired by the broad field of STS (for either “Science and 
Technology Studies” or “Science, Technology, and Society”). STS-influenced work within iSchools has 
been balkanized across a range of functional classifications and disciplinary identities. These differ from 
school to school, and are often seen as marginal or esoteric within the strongly technical focus of many 
iSchools. Calling this the “Social Studies of Information” acknowledges the shared object of study around 
which iSchools are built. Full program at www.socialstudiesof.info/workshop14. 
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1 Defining the Social Studies of Information 


The workshop aims to bring together a diverse range of scholars interested in exploring the potential of the 
“Social Studies of Information” (SSI) as a meta-identity for information research informed by the humanities 
and social sciences. 

This is inspired by the broad field of STS (for either “Science and Technology Studies” or “Science, 
Technology, and Society”), which exerts an ever deeper influence on a variety of research traditions within 
iSchools. Many faculty are hired with degrees in STS, while others receive a grounding in its concepts and 
perspectives during their education in schools of information or communication. iSchool scholars are well 
represented at meetings of the Society for Social Studies of Science (4S). 

STS-influenced work within iSchools has been balkanized across a range of functional classifications 
and disciplinary identities, sometimes seen as marginal or esoteric. These identities differ from school to 
school, and are often seen as marginal or esoteric within the strongly technical focus of many iSchools. This 
includes much work in areas such as information policy, information ethics or philosophy of information, 
values in design, software studies, socio-technical systems, information systems, Kittlerian media studies, 
information history, community informatics, internet studies, and social informatics. STS perspectives are 
increasingly prominent within established Library and Information Science areas such as archival studies 
and information organization. We are not seeking to replace the range of existing identities held by these 
scholars, but rather to create opportunities for them to discover common ground. 

The workshop is an important step towards the growth of a new, broadly based, community of 
iSchool scholars that cuts across these specific identities, mirroring the success of the STS movement in 
providing a space in which specialists of different kinds can productively interact. SSI scholars share a 
commitment to using the methods of the humanities and social sciences to probe what are often thought of 
as exclusively technical domains. Calling this the “Social Studies of Information” acknowledges both the 
connection to STS and the shared object of study around which iSchools are built. SSI is indigenous to the 
iSchool world and covers the full range of information-related work, cultures, practices, and institutions 
rather than being focused exclusively on the use of information technology. 
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2 Workshop Format 


The workshop program is continuing to evolve as of December 2013, the deadline for this repository, and 
so details on the exact sequence and format of activities will be left for our online program at 
www.socialstudiesof.info/workshop14. 

We are structuring the workshop to maximize interactivity and involve as many people as possible. 
Most of the time will be spent either on roundtable discussions, interactive sessions allowing participants 
to introduce their own ideas and research to the community, or breakout groups gathered round tables to 
discuss particular topics of interest and report them back to the larger group. We will also be using social 
media and old fashioned email to keep the community together before, during, and after the workshop. 


3 Program Participants 


The following have confirmed their participation in the workshop program as speakers, breakout group 
moderators, or panelists. 


e Kimberly Anderson, University of Wisconsin--Milwaukee 
e Alistair Black, University of Illinois, Urbana-Champaign 
e = Pnina Fichman, Indiana University, Bloomington 

e Anne Gilliland, UCLA 

e Thomas Haigh, University of Wisconsin--Milwaukee 

e = Maria Haigh, University of Wisconsin--Milwaukee 

e Jenna Hartel, University of Toronto 

e Caroline Haythornthwaite, University of British Columbia 
e Lai Mai, University College, Dublin 

e Annette Markham, Aarhus University 

e Lilly Nguyen, University of California, Irvine 

e Nadine Kozak, , University of Wisconsin--Milwaukee 

e Anabel Quan-Haase, University of Western Ontario 

e David Ribes, Georgetown University 

e Howard Rosenbaum, Indiana University, Bloomington 

e Kalpana Shankar, University College, Dublin 

e Kristene Unsworth, Drexel University 

e Howard White, Drexel University 

e Kelvin White, University of Oklahoma 
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Abstract 

Spatial information in the form of location-based services, spatial queries, geographical information 
systems, navigation systems and other spatially aware tools have become commonplace in the past 
decade. This workshop reviews cutting edge research in the area of spatial information science, including 
navigation and wayfinding, the use of shared spatial information, location-based privacy, big (spatial) 
data, volunteered geographic information, empirical studies on spatial cognition, and other recent 
developments in the field. 
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1 Introduction 


With the widespread growth of GPS-enabled devices and tools spatial information in the form of location- 
based services, spatial queries, geographical information systems, navigation systems and other spatially 
aware tools have become commonplace in the past decade. The rapid growth of interest in spatial 
information across the iSchools is being recognized through this workshop which reviews cutting edge 
research in the area of spatial information science, including navigation and wayfinding, the use of shared 
spatial information, location-based privacy, big (spatial) data, volunteered geographic information, 
empirical studies on spatial cognition, and other recent developments in the field. 

The workshop will be of interest to those who wish to bring location-aware services into their own 
research or develop new tools that will include various aspects of spatial computing, broadly defined. 
Participants will also be exposed to the benefits and limits, including privacy concerns, of using geographic 
information. In addition, theoretical advances in how spatial thinking is distinct from other forms of 
reasoning will be discussed. The workshop will delineate ways in which spatial information can support the 
information needs and information use of individuals in the research community. Presenters will be 
encouraged to present hands-on examples or demonstrations, where appropriate. 

Participants at the workshop will include both presenters and general attendees. The format of the 
workshop allows for ample discussion, both during the session and through follow-up reports after the 
conference. The call for position papers, as well as summaries of the discussions from the conference, can be 
found at workshop website at http://www.sis.pitt.edu/~cogmap/sisrig/sym14.html. 


2 Conclusion 


The goals are two-fold. First and foremost, the workshop will delineate ways in which spatial information 
can support the information needs and information use of individuals in the research community. Second, 
attendees will gain an understanding of new domains of inquiry brought forth through discussion with the 


presenters. 
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Abstract 

Multiple research areas within the field of information studies grapple with the notion of technology and 
its role in social processes and outcomes. Recent theorizations on sociomateriality reflect a renewed 
interest in studying the mutually constitutive nature of the relationships among technology, materiality 
and social contexts (e.g., Leonardi, Nardi, & Kallinikos, 2012; Orlikowski, 2007). In specific, the 
sociomaterial perspective offers a promising path for ‘information’ scholars to move from theorizing about 
the “effects” of specific technologies on organizational and societal outcomes to considering the 


constitutive “entanglement” among them. 
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1 Introduction 


Sociomaterial approaches are currently being developed in parallel within a diverse set of disciplines 
associated with the ‘information’ field, ranging from science and technology studies (STS) and information 
systems to organizational studies and computer supported cooperative work (CSCW). While the 
interdisciplinary foundations of sociomateriality could definitely be a source of advantage for coming up 
with a robust theoretical framework, it could also be a source of limitation, especially with regard to how 
each discipline has reinterpreted and emphasized certain facets of the sociomaterial perspective at the 
expense of others. As a consequence, the larger objectives of the perspective could be lost, misunderstood, 
or even remain opaque to the broader information research community (Leonardi, Neeley, Hall, & Gerber, 
2011), thereby limiting its practical utility for conducting empirical research despite its extreme relevance 
to the ‘information’ field. 

To address these concerns, we propose an open discussion on the conceptual and practical 
applications of sociomateriality, with a specific focus on conducting a ‘sociomaterial inquiry’ i.e. designing 
and executing empirical research informed by the sociomaterial perspective across a range of phenomenon 
that are relevant to the ‘information’ field. We embrace a multitude of approaches to sociomateriality 
represented within our community, reflective of the diverse background and interests of the proposing 
authors. Therefore, the discussion will be held in a “fishbowl” format (described below), bringing forth 
awareness concerning different empirical approaches that are available for studying sociomateriality in a 
range of contexts. 

In particular, this event will be an opportunity to 1) introduce notions of sociomateriality to new 
audiences; 2) provide a forum for comparing and contrasting different approaches toward conducting a 
‘sociomaterial inquiry’ to understand a range of phenomenon that are relevant to the ‘information’ field, 
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and 3) share research practices and practical insights regarding the application of these ideas to conduct 
better empirical research. 


2 Intended audience and relevance to the iConference 


The Sociomateriality Fishbowl will explore the ways that an emerging sociomaterial worldview can be used 
to guide empirical research in the information domain. In this way, it will bring together researchers with 
diverse backgrounds (e.g., human-computer interactions, information systems, CSCW, organizational 
studies, and library and information science) to discuss theoretical and methodological approaches and 
associated challenges. 

The inspiration for this event came from a mini workshop held at the 2012 Summer Institute held 
by the Consortium for the Science of Socio-technical Systems (CSST 2012). We would like to bring this 
conversation to the iConference in order to take advantage of the diversity, openness and wealth of 
experience of this community. Our goal is to build a stronger collective understanding of sociomateriality 
to lay groundwork for productive conversations in the future. We believe that the interdisciplinary nature 
of information field makes it well suited to train scholars and produce knowledge about the “entanglements” 
of people, data, technology, organizations and institutional arrangements. 

Attention to the intersection between people, information and technology defines the core of 
information research (Dourish & Mazmanian, 2011; Orlikowski & Iacono, 2001). Therefore, we strongly 
believe the proposed fishbowl session has the potential to engage a broad range of iConference attendees. 
The fishbowl organizers hold different backgrounds and are pursuing distinct research questions, but our 
shared interests in sociomateriality will enable us and the audience to explore the topic across different 
levels of analysis, research domains and theoretical lenses. Additionally, our experience with a fishbowl 
discussion on the topic of Materiality at the iConference 2011 (Seattle, WA) indicates both senior and junior 
researchers are interested in this topic and it is seen as an emerging and thriving research endeavor. The 
proposed fishbowl will create continuity from one conference to another and allow researchers to touch base 
and keep abreast of recent developments in this space. 


3 Proposed activities 


The typical fishbowl format involves an open room with chairs that can be arranged in a series of concentric 
circles. In the center of the room, five chairs are placed facing each other in a close ring. This is the 
“fishbowl.” Rows of additional seats are arranged around this circle. Four discussion participants sit in the 
fishbowl, always leaving one seat empty. All members of the audience are welcome to join in the discussion, 
but you must be seated in the “fishbowl” in order to speak. When someone enters the fishbowl they take 
the empty seat and one of the previous speakers must step out of the conversation. 

The fishbowl format is ideally suited to discussions of emerging topics, as it enables multiple 
members of the community to contribute questions, insights, and challenges. Junior and senior researchers 
are given equal opportunity to address the group, and the format ensures a continual mixing of opinions 
and perspectives. For the this fishbowl discussion, a series of focusing questions will be offered, including: 


e What does a sociomaterial approach contribute to our understanding of current research problems? 
What do sociomaterial approaches make visible that others do not? 

e What are the different flavors of sociomateriality (digital materiality, visual materiality, 
immateriality, etc.) and how do they differ from each other? 

e What are the different theoretical lenses related to sociomateriality (e.g. performativity, mangle of 
practice, imbrications, apparatuses, agencement, actor-networks/sociology of associations etc.) and 
how do they differ from each other? 

e How have researchers applied sociomateriality in different domains? 
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e What are the methods that are well suited to this approach? 
e What are the outlets and publication venues best suited to research that takes a sociomaterial 
approach? 


4 Length and number of participants 


We anticipate the fishbowl discussion running for 60-90 minutes, with approximately 15-25 participants. 


5 Special requirements 


We do not anticipant any other special needs beyond those related to the seating configurations described 
above, unless the room is very large or the acoustics are very poor. In that case, a microphone would be 
helpful. 
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Abstract 

Although scholars have continually thought to focus on what information may be, questions about how 
and why information comes to us have been neglected and remain poorly understood. Our session seeks 
to address this lacuna by exploring historical conceptions of information and by developing the idea of 
“systems of information provision.” The conversation will engage in such questions as: What is to be 
gained by considering history in explorations of big data, data analytics, and informational systems? On 
the other hand, what hazards lie in a study of information that does not account for the forces of history? 
How does the current “data-driven” moment shed light on the past? How might iSchools enrich their 
programs by offering historical perspectives on the study of data, documents, information, libraries, 
archives, networks, and technologies? And are there dangers in information histories taking an 


instrumental cue from present-day information requirements and issues? 
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1 Description 


Although scholars have continually thought to focus on what information may be, questions about how and 
why information comes to us — the historical circumstances of its provision — have been neglected and remain 
poorly understood. Our session seeks to address this lacuna by exploring historical conceptions of 
information and by developing the idea of “systems of information provision.” The conversation will engage 
in such questions as: What is to be gained by considering history in explorations of big data, data analytics, 
and informational systems? On the other hand, what hazards lie in a study of information that does not 
account for the shaping forces of history? How does the current “data-driven” moment shed light on the 
past? How might iSchools enrich their programs by offering historical perspectives on the study of data, 
documents, information, libraries, archives, networks, and technologies? And are there dangers in 


information histories taking an instrumental cue from present-day information requirements and issues? 


2 Format 


The Graduate School of Library and Information Science at the University of Illinois has a long-standing 
tradition in fostering historical explorations of books, libraries, and librarianship, and has been part of the 
iSchools organization since 2003. Today, almost a third of its faculty self-identify as historians of 
information. For this reason, a core of participants from Illinois will act as hosts of an informal “party,” 
and enlist invited guests in a public conversation about the history of information. The notion of party sets 
the appropriate mood for this spirited and shared engagement of the social, cultural, political, and economic 
issues around the provision of information. There will be room for open-ended debates and good-natured 


disagreements, as well as re-negotiations and the promise of future collaboration. 
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3 Participants 


Bonnie Mak is a historian of books at the University of Illinois, where she is assistant professor. She 
combines her background in medieval studies with explorations of the production and circulation of 
knowledge in the 21* century. Her book, How the Page Matters (2011), examines the interface of the page 
from the fifteenth century to the present day, and the forthcoming article, “Archaeology of a Digitization,” 


scrutinizes the historical circumstances surrounding the construction of a database. 


Alistair Black is a historian of libraries and librarianship at the University of Illinois, where he is full 
professor. He is author of A New History of the English Public Library (1996) and The Public Library in 
Britain 1914-2000 (2000), and co-author of The Early Information Society in Britain, 1900-1960 (2007); 
and Books, Buildings and Social Engineering (2009), a socio-architectural history of early public libraries in 
Britain. He is currently exploring the history of corporate libraries and staff magazines, and the design of 
public libraries in the 1960s. 


Dan Schiller is a historian of telecommunications. He writes extensively on the development of digital 
capitalism and the social history of U.S. telecommunications. His forthcoming book is entitled Digital 
Depression. He was recently co-PI of a grant that supported a doctoral specialization in “Information in 
Society” at the Graduate School of Library & information Science at the University of Illinois, where he is 
appointed full professor. 


To represent a diverse range of interests and perspectives, the following guests have been invited: 


e William Aspray is the Bill and Lewis Suit Professor of Information Technologies at the University 
of Texas at Austin. He is a historian of science and technology, and editor of Information & Culture: 
A Journal of History. 


e Brian Beaton is a historian of recent science and technology at the iSchool at the University of 
Pittsburgh, where he is beginning his second year as an assistant professor. 


e Greg Downey is a historian of information labor and Director of the Center for the History of 
Print & Digital Culture at the University of Wisconsin-Madison, one of the newest institutions to 
join the iSchools organization. 


e Heather MacNeil is an archival scholar. She is appointed full professor at the iSchool, University 
of Toronto. 


e Laura Skouvig is associate professor at the Royal School of Library and Information Science, 
University of Copenhagen. She is a cultural historian of information in early-19"-century Denmark. 


4 Purpose, Intended Audience, Relevance to the Field 


The purpose of the conversation is to locate the history of information within the iSchool movement; explore 
how information history is represented and taught across different iSchools; and showcase the relevance of 
historical research to the investigations of society, culture, information, and technology that constitute the 
shared focus of the iSchools. 

Others are more than welcome to join the party. Audience members may be historians themselves, 
or perhaps scholars and students who are interested in learning more about humanistic approaches in the 
examination of information and technology. The discussion will therefore help to raise awareness of the 
diverse ways in which history is, and could be, taken up by iSchools in the study of information. 
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5 Proposed Activities 


The hosts will offer short perspectives on the history of information, focussing on an aspect of research or 
teaching in the iSchool environment. These brief presentations will be followed by a lively conversation with 
invited guests about their own pursuit of information history, in different contexts, institutions, and 
countries. Audience members may then join in, ask questions, and share their relevant experiences. 
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Abstract 

The ways in which people interact with information are evolving rapidly. Yet, the academic disciplines 
of information science and learning science—seen by some to be orthogonal domains (Bates, 1999)—often 
approach related issues of sociotechnical culture, context, and computation from highly proximate yet 
vastly “siloed” perspectives. In the spirit of the conference’s ‘breaking down walls’ theme, we seek to 
begin breaking down some of the barriers that separate information and education studies. We have three 
direct goals in organizing this session: 1) To identify and reinforce the nascent community of scholars 
within the iSchool community that have interests in the intersection of information and learning; 2) To 
discuss and sharpen key ideas at the intersection of information and learning science that participants 
could leverage for future scholarship; 3) To identify and develop a set of concrete takeaways related to 
our theme such as ideas for future research proposals, journal articles, and/or applications. Our primary 
method for achieving these goals will be via a set of brainstorming activities that focus on identifying 
and understanding points of synergy between information and learning science. 
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1 Introduction 
“Conversations lead to ideas, ideas to projects, and projects to positive change.” 
~Designer John Bielenberg, Thinking Wrong 


The ways in which people interact with information are evolving rapidly. We are fast moving away from 
clearly demarcated technologies and arenas for information sharing or learning, and instead moving toward 
blended realms of public, peer-oriented interaction made possible by new social norms and technological 
affordances. Despite this rapid activity in the real world, the academic disciplines of information science 
and learning science—seen by some to be orthogonal domains (Bates, 1999)—often approach related issues 
of sociotechnical culture, context, and computation from highly proximate yet vastly “siloed” perspectives. 
In the spirit of the conference’s ‘breaking down walls’ theme, we seek to begin breaking down some of the 
barriers that separate information and education studies. We feel that the time is now to reinforce how 
these complementary domains can strengthen existing synergies and mutual topics of concern, such as the 
role of social media in human cognition and development, learning analytics, open education systems, and 
technology-mediated informal learning. 

This interactive session seeks to start this ball rolling by means of conversation and collaboration. 
We have three direct goals in organizing this session: 


1) To identify and reinforce the nascent community of scholars within the iSchool community that 
have interests in the intersection of information and learning; 
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2) To discuss and sharpen key ideas at the intersection of information and learning science that 
participants could leverage for future scholarship such as articles for the special issue of The 
Information Society that Ahn and Erickson are co-editing or other scholarly work; 

3) To identify and develop a set of concrete takeaways related to our theme such as ideas for future 
research proposals, journal articles, and/or applications. Our primary method for achieving these 
goals will be via a set of brainstorming activities that focus on identifying and understanding points 
of synergy between information and learning science. The organizers all work at the intersection of 
these areas and are known for their cross-disciplinary collaborations and research contributions. 


2 Activity Plan 


2.1 Part 1: Opening [30 minutes] 


The session will begin with a short introduction, followed by a brief talk by Caroline Haythornthwaite, 
Professor and Director of the iSchool at the University of British Columbia. Dr. Haythornthwaite has an 
international reputation in research on information and knowledge sharing through social networks, and 
the impact of computer media and the Internet on learning and social interaction. Her research includes 
empirical and theoretical work on the development and nature of networks, crowds and communities online, 
the transformative effects of the Internet on how, where and with whom we learn, analytics of networks 
and learning, and distributed knowledge processes. Current initiatives address the role of social media for 
promotion of health and well-being (http://socialweb4health.pwias.ubc.ca/), development of the Society for 
Learning Analytics Research (http://www.solaresearch.org/), and examination of new media and literacy 
(http://blogs.ubc.ca/newliteracies/). Major publications include The Internet in Everyday Life (2002, with 
Barry Wellman); Learning, Culture and Community in Online Education: Research and Practice (2004, 
with Michelle M. Kazmer), the Handbook of E-learning Research (2007, with Richard Andrews), and E- 
learning Theory and Practice (2011, with Richard Andrews). Further information can be found on her 
website http: //haythorn.wordpress.com/. 


2.2 Part 2: Interactive Engagement [45 minutes] 


We have organized three breakout sessions, each moderated by one of the organizers, that will engage 
participants in meeting one another, prompting conversations, and finding areas of common interest. To 
keep participants moving and ideas flowing, these activities are meant to be conducted in a circuit by 
everyone in the session; small groups will shift from one activity to the next in 15-minute intervals. The 
goal of each activity is to activate different ways of thinking, and leverage different strengths and levels of 


experience among participants. 


e Activity One: Card Sort — Small groups will arrange a series of concept cards to form a unique 
organization scheme, relating the supplied terms to their own collective research. Concept cards will 
include terms drawn from information science, the learning sciences, and other terms that might be 
employed by several scholarly domains. This activity will surface preconceptions and assumptions 
about the applicability of theories, methods, and concepts to individual scholars’ work. Each 
individual participant will have 3 blank cards on which they can contribute their own terms. Images 
of each group’s ordering scheme will be captured and used as part of the session debrief. 

e Activity Two: Rapid Brainstorm — Small group participants will brainstorm using different colored 
sticky notes and markers to 1) develop novel ideas along the intersection between information and 
learning domains, and 2) note issues that may affect the practicalities of this hypothetical 
interdisciplinary research. Brainstorm topics will be wide open to participants’ imaginations, though 
we will offer prompts to aid ideation such as funding, theory, method, and publication. As each 
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break-out group completes their brainstorm, they will be combined with artifacts from previous 
groups to create a “heat map” or density graph of popular ideas and issues. 

e Activity Three: Sketching and Mapping — The organizers will supply participants with several topics 
that reside at the intersection of information and learning science, such as analyzing learning in 
massive open courseware systems. Participants will be provided with a short intro to concept 
mapping, and then will spend 10 minutes developing a concept map or sketch of this topic area. 
Completed concept maps will be shared back with the large group during the closing synthesis 


session. 


2.3 Part 3: Synthesis [15 minutes] 
Each of the three activity leaders will synthesize the activity sessions and draw upon participant 
contributions to give an overall picture of the session, emerging themes, and areas of common interest and 


concern. 


3 Relevance to the Field 


The iConference community is by definition, and with pride, a highly interdisciplinary group of people. We 
are in an advantageous position to identify and reinforce emergent trends and issues that more discipline- 
identified communities tend to overlook. This session presents an opportunity to be scholarly entrepreneurs, 
namely to showcase synergies and build bridges among theoretical and empirical areas relating to the 
intersection of information and learning science that have both great research as well as applied potential. 
It also ties directly to a future publication (ie., special issue of The Information Society, 
http://www.indiana.edu/~tisj/connecting_fields.pdf) that will have direct impact in broadening this 


discourse among our community as well as others. 


3.1 Participants 


This session will attract scholars at all levels, from students beginning a program of study to established 
researchers looking to explore new areas of interdisciplinary thinking. Based on the format of the session, 
the maximum number of participants should be set at 30. 


3.2 Session Length 


90 minutes 


3.3 Special Equipment 


Sticky notes, markers, large pieces of paper for concept mapping, masking tape 
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Abstract 

While pursuing graduate degrees in library and information science (LIS), it is hoped that students will 
learn the basics necessary for competent, inclusive, and caring professional practice. This panel will 
explore how social justice topics and techniques can be integrated in LIS through a variety of contexts 
including curricular, extra-curricular professional development, and research. Social justice integration 
creates opportunities for students to gain a more holistic and inclusive perspective on the relationships 
between people, information, and technology, with the ultimate potential of shaping a more just society. 
The panel topics approach social justice in LIS from a range of professional experiences, drawing on 


concrete examples, interventions, and historical cases. 
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While pursuing graduate degrees in library and information science (LIS), it is hoped that students will 
learn the basics necessary for competent, inclusive, and caring professional practice. This requires a blended 
educational approach that emphasizes culture, context, and critical thinking that extends across curricula, 
professional practice, and research. Components of this blended approach include developing the ability to 
critically reflect on the role of information technologies and institutions in society, as well as their own 
positionality and privilege that shapes their practice. Honing these reflection skills is particularly important 
in the current information environment that is shaped by widening wealth gaps, decreased funding for social 
services and education, and increased data surveillance initiatives. Information professionals are involved 
at every level of information provision and technology design and, thus, are uniquely poised to impact the 
communities they serve, as well as broader society. 

This panel will explore how social justice topics and techniques can be integrated in LIS through a 
variety of contexts including curricular, extra-curricular professional development, and research. Social 
justice integration creates opportunities for students to gain a more holistic and inclusive perspective on 
the relationships between people, information, and technology, with the ultimate potential of shaping a 
more just society. The panel topics approach social justice in LIS from a range of professional experiences, 
drawing on concrete examples, interventions, and historical cases: 


e Social Justice as Topic and Tool in the LIS Classroom 

e Nicole A. Cooke encourages the teaching of social justice in the curriculum as a way to begin 
addressing the holistic development of future information professionals. 

e Teaching Trayvon: The Value of Teaching and Talking about Race, Gender and Sexuality in the 
Information Professions 

e Safiya U. Noble discusses the positive aspects and pitfalls of injecting a course focused on race and 


gender into the curriculum as a diversity intervention. 
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e Inclusions and Exclusions: Reflections on a Reading Group 

e Miriam E. Sweeney reflects on the history and formation of an extracurricular reading group about 
race and diversity that extends social justice initiatives outside of the formal learning environment 
to the broader campus community. 

e Description is a Drag, and Vice Versa: Classification of Queer Identities 

e KR Roberto offers a historical overview of the ways in which authorized vocabularies have differed 
from vernacular language commonly used by community members and LGBTQ scholars to describe 
their own lives. The talk also explores the potential ramifications of these disconnects for queer and 
gender-nonconforming users. 

e Latina/o Librarians in the Digital Age: An Historical Reflection of Social Justice in Librarianship 

e Melissa Villa-Nicholas offers a historical reflection on the ways in which REFORMA navigated the 
digital age and encourages present day organizing tactics surrounding technological equitability. 


Each panelist will give a brief presentation that focuses on a particular case or context where they have 
integrated or applied social justice topics or techniques. After the presentations, the floor will be opened 
for questions, shared experiences, and discussion that probe the broader topic of using social justice to train 
culturally competent information professionals and strive for greater equality and justice in society. 

This panel will be organized as a 90 minutes session with the following agenda: 


e Presentations (60 minutes, 12 minutes each) 
e Group Discussion (30 minutes) 


The goals for this session are to: 


e Reflect on the experience of introducing social justice topics and techniques into LIS curricula 

e Explore how extracurricular learning spaces may be used to facilitate broader social justice 
outcomes in our institutional communities 

e Locate historical research as an intervention that introduces justice-oriented counter-narratives in 
support of curricular goals 


This panel emphasizes the culture and context part of the equation in “Breaking Down Walls: Culture- 
Context-Computing.” The organizers extend the meaning of context and culture to ask broader questions 
about the responsibility LIS educators, professionals, and researchers may have for fostering social justice 
values in the profession. The intended audience for this panel includes LIS educators, practitioners, and 
researchers who are interested in the many ways social justice techniques and topics may be integrated into 
LIS. The panel format uses the experiences and examples brought forth in the formal presentations to 
foster a rich group dialog where participants will be encouraged to bring in their own experiences and 
questions. 


Length: 90 minutes 

Preferred number of participants: 20-30 

Special requests/equipment needs: This presentation requires standard technology needs (a projector, 
laptop, screen), as well as flip charts and markers for group brainstorming. 
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Abstract 

This session for interaction and engagement is organized by members of the EU funded ACUMEN project 
that aimed at understanding the ways in which researchers are evaluated by their peers and by 
institutions, and at assessing how the science system can be improved and enhanced (see http://research- 
acumen.eu/). Among the topics to be emphasized are: 1) the role of bibliometric indicators in evaluations 
and 2) possible enhancements in the way researchers present themselves in evaluation situations by 
extending the information provided in standard CVs, and providing a narrative for these which in turn 
helps the evaluators to reach decision based on richer evidence. To make our model more concrete, we 
will present evaluation scenarios and personas at different stages of their career. The scenarios and 
personas will motivate the audience to become involved, and a significant part of the event will be 
dedicated to discussion and interaction. 
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1 Background and purpose 


We are members of the EU funded ACUMEN project (2010-2014) that aimed at understanding the ways 
in which researchers are evaluated by their peers and by institutions, and at assessing how the science 
system can be improved and enhanced (see http://research-acumen.eu/). In the evaluation processes there 
are two sides: evaluators and evaluands (Dahler-Larsen, 2011). Researchers and academics in their careers 
often experience both roles. Moments of evaluation in science encompass staff recruitments for job 
applications, assessments procedures for gaining resources and grants or being promoted, reviewing 
publications and thesis. CVs with information on education, previous work places, grants, publications and 
presentations are one standard instrument for career development. Increasingly, bibliometric indicators are 
used in evaluation. ACUMEN reviewed the evaluation processes in science and its consequences for 
individual careers as a whole. Our main goal was to reflect how the individual researcher can be empowered 
in those externally driven events. 

In this event we summarize the insights of all the ACUMEN members (http://research- 
acumen.eu/partners) and engage the audience using means of active participation such as brainstorming 
and role games. Among the topics we want to emphasize are: 1) the role of bibliometric indicators in 
evaluations and 2) possible enhancements in the way researchers present themselves in evaluation situations 
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by extending the information provided in standard CVs, and providing a narrative for these which in turn 
helps the evaluators to reach decision based on richer evidence. 

Concerning the first topic, we observe that evaluations are often based on the number of 
publications, the publication venues and the citations these publications received. Quite often, actual 
citation counts are replaced by the impact factor (Garfield, 1994) of the journal in which the article is 
published. Journal impact factor is also used as a proxy for the reputation of journals. Currently there is a 
serious ongoing debate regarding the use of the impact factor in these ways (DORA, 2012; Wouters, 2013). 
Another measure frequently used in evaluations of individuals is the h-index (Hirsch, 2005). The h-index 
has limitations as well (Bornmann & Daniel, 2007), and is highly dependent on the data source being used 
(Bar-Ilan, 2008). Special problems arise in the assessment of humanities and social science researchers where 
often journal publications are not the norm, and citation counts are usually low and the coverage of the 
citation databases is low (Hicks, 2004; Moed, 2005). At the same time, there are additional ways to assess 
the impact of research that are not based on citation counts, for example considering downloads (Kurtz & 
Bollen, 2010) or impact assessed based on visibility on social media, scientific and general, as measured for 
example by ImpactStory (impactstory.org) or Altmetric (altmetric.com) (for the altmetric manifesto, see 
Priem et al., 2010). ImpactStory measures the social impact of diverse “publications”, e.g. datasets, 
slideshare presentations and software. For a recent testimony of the success of altmetrics (alternative 
metrics), see (Kwok, 2013). In addition to measuring social impact and usage of scientific outputs, 
researchers have other skills that they rarely have an opportunity to present, for example emphasizing 
specifying scientific or technological expertise, public engagement, managerial and collaborative capabilities, 
which may be relevant to the specific evaluation event. 


2 Intended audience 


The intended audience of the proposed event is academics at all stages of their career with experiences in 
evaluation situations. Students planning for an academic career are also welcome. To raise awareness of the 
actual debates, practices and developed guidelines for practices is one goal of this event. In addition the 
future generation of information scientists will be also subject and object in evaluative practices, as any 
other researcher, and so we believe they will profit from attending the event in a very direct practical way. 


3 Proposed activities 


Our aim is to present thoughts on the subject and to engage the audience in a lively discussion. To make 
our model more concrete, we will present evaluation scenarios and personas at different stages of their 
career. The scenarios and personas will motivate the audience to become involved, and a significant part of 
the event will be dedicated to discussion and interaction. 

Outcomes of the event will be reported on the ACUMEN website. The link to the report will be 
sent to the event participants. 


4 Relevance to the Conference/Significance to the Field 


As evaluations often involve “impact” measurements, the information science community with its experience 
in bibliometrics is especially well-suited to provide ideas and feedback for our project. For us, “impact” does 
not only include research impact (usually measured in terms of citations or h-indexes), but also societal 
impact, which can be measured in a variety of ways including knowledge transfer, patents or visibility on 
the Web and on social media. The findings of the ACUMEN project are of particular interest to information 
scientists and the iSchool curricula. 

Both we and the audience will benefit from the event: we will receive feedback on our model, and 


the participants will get new ideas on how to better present themselves in forthcoming 


1205 


iConference 2014 Judit Bar-Ilan et al. 


5 References 


Bar-Ilan, J. (2008). Which h-index? A comparison of WoS, Scopus and Google Scholar. Scientometrics, 
74(2), 257-271. 

Bornmann, L., & Daniel, H-D. (2007). What do we know about the h index? Journal of the American 
Society for Information Science and Technology, 58(9), 1381-1385. 

Dahler-Larsen, P. (2011). The evaluation society, Stanford University Press. 

DORA (2012). San Francisco Declaration on Research Assessment. Retrieved from 
http://am.ascb.org/dora/ 

Garfield, E. (1994). The Impact Factor. Comments in Current Contents, 25(3), 3-7. Retrieved from 
http://wokinfo.com/essays/impact-factor/ 

Hicks, D. (2004). The four literatures of social science. In Henk F Moed, ed., Handbook of Quantitative 
Science and Technology Research, Kluwer Academic, pp. 473-496. Retrieved from 
http://works.bepress.com/diana_hicks/16/ 

Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the 
National Academy of Sciences of the United States of America (PNAS), 102(46), 16569-16572. 

Kurtz, M. J., & Bollen, J. (2010). Usage bibliometrics. Annual Review of Information Science and 
Technology, 44, 1-64. 

Kwok, R. (2013). Research impact: Altmetrics make their mark. Nature, 500, 491-493. Retrieved from 
http://www.nature.com/naturejobs/science/articles/10.1038/nj7463-491a 

Moed, H. F. (2005) Differences between science, social sciences and humanities. In Citation Analysis in 
Research Evaluation, pp.147-152. Springer. 

Priem, J., Taraborelli, D., Groth, P., & Neylon, C. (2010). Altmetrics: A Manifesto. Retrieved from 
http://altmetrics.org/manifesto/ 

Wouters, P. (2013). The evidence on the Journal Impact Factor. In: The Citation Culture. Retrieved from 
http://citationculture.wordpress.com/2013/06/03/the-evidence-on-the-journal-impact-factor/ 


1206 


Networks in Information: An Interactive Engagement of Theoretical and Analytical 
Approaches 


Ava Lew!, Barry Wellman!, Rhonda McEwen!', Zack Hyat! and Jenna Jacobson! 


1 University of Toronto 


Abstract 

Networks, whether they are interpersonal, organizational, or mediated by technology are the essence of 
cultural and social worlds. The Networks in Information session will engage attendees in discussions on 
the potential uses of a social network approach, including various theoretical and methodological 
applications. In addition, participants have the opportunity to experience the role of information and 
network structures on group problem-solving. This entails highlighting how the ways in which individuals 
are connected in groups may affect their ability to collectively complete tasks or devise solutions to 
problems. Recognizing that the information community is interdisciplinary and that the application of a 
social network approach cuts across disciplines, the session is open to anyone curious about the use of a 
social network approach in information research and how it applies to specific contexts (cultural, 
organizational, social and technological). No prior knowledge of the social network approach is required. 
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1 Introduction 


The purpose of the Networks in Information session is to increase awareness and generate conversation 


1 approach in 


among a diverse information audience about the potential uses of a social network (SN) 
information research. Recognizing that the information community tends to be interdisciplinary, and that 
the application of a SN approach cuts across disciplines, this session is open to anyone curious about the 
use of an SN approach in information research and how it applies to specific contexts (cultural, 


organizational, social and technological). 


2 Session Overview 


Networks, whether they are interpersonal, organizational, or mediated by technology are the essence of 
cultural and social worlds. As such, this session offers the interdisciplinary information community the 
opportunity to engage in discussions and experience the application of a SN approach in information 
research from cultural participatory perspectives, which include: 


e The impact of technological advancements on a networked society and culture, including the move 
from groups to networks and the rise of networked individualism. 

e Patterns of collaboration in research organizations and archival examination of 20 years of computer 
science proceedings that result in a typology of co-authorship networks. 

e The role of mobile technology in mediating the networks of youths during a critical transition in 
life; as well as the role of social media in the construction of personal networks. 


1! For our purposes, the SN approach, theories and methods may be applicable to various types of networks, as opposed to those 


perceived to consist of only human relations (Wasserman & Faust, 1994). 
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e The role of information and network structure on collaborative problem-solving as a potential 
microcosm of group (social, cultural or organizational) representation. 


The session begins with a panel discussion that includes conversations on how social network concepts and 
methods can be used to address questions in information studies, as it applies to different social, 
organizational and technological contexts. Each panel member will contextualize this broader message in 
the presentation of a specific example, and attendees will be able to partake in discussions on the 
presentations, concepts, methods and uses of a network approach. 

As an extension of the discussions, attendees will also have the unique opportunity to experience 
the role of information and network structures on group problem-solving by participating in an interactive 
activity. This activity will enable participants to quickly and easily grasp that the ways in which individuals 
are connected in groups (including who they interact, communicate and share information with) produces 
different network (group) structures that promote varying effects. Such effects entail a group's ability to 
collectively complete tasks or devise solutions to problems (Bavelas, 1950; Lazer & Friedman, 2007; 
McCubbins, Paturi, & Weller, 2009; Mason & Watts, 2012). 

During the interactive activity, groups of ten participants will be divided into two subgroups and 
arranged into different group structures. Each subgroup will complete identical word-problem tasks 
according to the directions provided. The groups will be monitored, results will be collected, and attendees 
will be engaged in a discussion of the tasks, which includes the results of the groups' performance and how 
they compare to results from recent studies. While the specific designs for the network (group) structures 
used in the activity are derived from the research literature, the word-problem task used in the activity is 
original. 


3 Agenda 


Panel Discussion: Conceptualizing Social Networks in Information 
(40 minutes). 


5 Minutes Ava Lew: Welcome and outline of the interaction session; and introduction of the panelists. 


5 Minutes Barry Wellman: Defining SNs, and the relationship between technological advances and 
conceptualizations of social networks over time. 


5 Minutes Rhonda McEwen: The integration of social network analysis in information research for a 
longitudinal study on the role of mobile phones in the relationships of youths in transition. 


5 Minutes Zack Hayat: The use of SNA in information research to study co-authorship over 20 years 
of proceedings of a computer science conference in the case of CASCON (Centre for 
Advanced Studies Conference). This is based on work done in collaboration with Dr. Kelly 


Lyons. 

5 Minutes Jenna Jacobson: Deconstructing the branding of personal networks through the use of social 
media. 

15 Minutes Discussions with audience 


Interactive Activity: Effects of Information Flow and Network Structure on Group Task Completion 
(50 minutes). 
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10 Minutes Ava Lew: Describe game and rules for completing the task; organize attendees into groups 
and distribute game. 


10 Minutes Groups complete game and results are collected. 
25 Minutes Presentation of results and engagement of attendees in discussions. 
5 Minutes Wrap-up and thank you. 


Total length of time: 90 minutes 


4 Relevance to the Conference/Significance to the Field 


In keeping with the iConference 2014 theme entitled, "Breaking Down Walls: Culture, Context, Computing", 
this interaction and engagement session will help to increase awareness among members of the diverse 
information community regarding the potential uses and integration of a SN approach in information 
research. This will be accomplished through the demonstration of specific examples related to various social, 
organizational, or technologically-mediated areas of life. Further, while SN concepts and methods have been 
used in information research (Chatman, 1991, 1992; Haythornthwaite 1996, 2002; Haythornthwaite & 
Wellman, 1998; Park, 2003; Lu 2007; Yang, Adamic, & Ackerman, 2008), and though some information 
researchers are familiar with this area, there are a number of members within the information community 
who have not been exposed to the SN perspective or understand how this approach is applicable to 
information studies. In light of this, there is value in providing attendees with a new perspective that they 
may not have previously considered; and an understanding of how a SN approach may apply to information 
research that is concerned with people, organizations or other entities embedded in networks, which are 
often mediated by technology in many of today's cultures and societies. Significantly, attendees will leave 
the session armed with a new approach in their conceptual toolbox that they can integrate into their own 


research. 
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Abstract 

2013 marks the 10" anniversary of the death of Rob Kling, one of the founders of social informatics in 
North America. This session for interaction and engagement will provide researchers in the field an 
opportunity to reflect on his legacy, to discuss the current state of the social study of technology, focusing 
on building bridges between social informatics and sociotechnical research, and looking to the future of 
the overlaps between these fields. This session is intended for doctoral students, early career and 
established researchers interested in social informatics and/or sociotechnical research and, more broadly, 


in the social study of computing 
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1 Purpose and Intended Audience: 


2013 marks the 10" anniversary of the death of Rob Kling, one of the founders of social informatics in North 
America. This session for interaction and engagement will provide researchers in the field an opportunity 
to reflect on his legacy, to discuss the current state of the social study of technology, focusing on building 
bridges between social informatics and sociotechnical research, and looking to the future of the overlaps 
between these fields. This session is intended for doctoral students, early career and established researchers 


interested in social informatics and/or sociotechnical research and, more broadly, in the social study of 
computing. One expected outcome is for a conversation to begin among participants about the ways in 
which they can position their research agendas within the social study of computing, taking advantage of 
insights from social informatics and sociotechnical research. The session will foster also connections that 
begin to bridge the gaps between the sociotechnical and social informatics scholarly communities. 


2 Proposed activities including agenda, ramp-up (development), and follow-through: 


This will be an interactive panel that will begin with several short presentations (10-12 minutes) by: John 
King, Steve Sawyer, and Ingrid Erickson. These three scholars, at different stages of their careers, will 
present their thoughts on the relationships between sociotechnical research and social informatics, ways to 
build bridges between these epistemic communities and the important research questions that researchers 
interested in the social study of computing should be concerned with going forward. They will also comment 
on the legacy of Rob Kling. There will be time for questions after these presentations. 

Following their presentations, the audience will move to small discussion tables, each of which will 
be led by one of the panelist /organizers, listed below. Three of the table leaders (Fichman, Rosenbaum, and 
Shankar) are senior scholars who have published and presented extensively on social informatics and the 
fourth (Nemer) is an advanced doctoral student who is working at the intersection of social and community 
informatics. Participants will be asked to discuss their own work in light of the presenters’ comments and 
speculate about the types of research initiatives and questions that will motivate social informatics and 
sociotechnical research in the next five years. Table leaders will be taking notes as the discussion proceeds, 
looking for interesting themes. After 30 minutes, table leaders will report back to the audience, summarizing 
the themes. 
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3 Relevance to the Conference/Significance to the Field: 


The topic of bridging social informatics and sociotechnical research is relevant for the iConference because 
researchers in both communities make up a considerable number of conference attendees, faculty and 
students at iSchools. There have been sessions on these topics at past conferences and there should be 
considerable interest in the session. This session is of significance to the field because many researchers in 
the social study of computing, especially early career people, are thinking hard about how to position their 
work, and the discussions may give them a sense of how to proceed. We hope that this session will be of 
use to scholars from both communities as we learn how to build bridges that will open paths for productive 
collaborative research into the social study of computing. 


Length: 90 minutes 
Preferred number of participants: Open 
Participants 
Panelists 
e Jon King, University of Michigan, jlking@umich.edu 
e Steve Sawyer, Syracuse University, ssawyerQ@syr.edu 
e Ingrid Erikson, Rutgers University, ime7@scarletmail.rutgers.edu 
Organizers/ Table Leaders 
e Howard Rosenbaum, Indiana University, hrosenba@indiana.edu 
e Pnina Fichman, Indiana University, fichman@indiana.edu 
e Kalpana Shankar, University College Dublin, kalpana.shankar@ucd.ie 


e David Nemer, Indiana University, dnemer@indiana.edu 
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Abstract 

The purpose of this session on the topic of “Interaction and Engagement” is community building and an 
exploration of the ethical dimensions of information and marginality. We will examine challenges, 
methodologies and theoretical frameworks related to work with immigrants, and other underrepresented 
communities. We will use Performative Social Science (PSS) (combining oral history and auto 
ethnography) (Guiney Yallop, Vallejo de Lopez, & Wright, 2008) to tackle some potential limitations 
that stem from our privileged positions and ability to border cross in both the physical and metaphorical 
sense. Our goal is to foster the creation and dissemination of new knowledge in order to investigate deeper 
information issues and challenges with underrepresented groups. The session will appeal to scholars, 
researchers and practitioners interested in development work, digital divides, digital inclusion, 
underrepresented communities, marginality and immigration studies. 


Keywords: ethics, marginality, engaged research, underrepresented communities, immigration 
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1 Introduction 


Thisworkshopis on “Interaction and Engagement” as community building, an exploration of the ethical 
dimensions of information and marginality. We will examine challenges, methodologies and theoretical 
frameworks related to work with immigrants, and other underrepresented communities. 

We will use Performative Social Science (PSS) (combining oral history and auto ethnography) 
(Guiney Yallop, Vallejo de Lopez, & Wright, 2008) to tackle some potential limitations that stem from our 
privileged positions and ability to border cross in both the physical and metaphorical sense. Our goal is to 
foster the creation and dissemination of new knowledge in order to investigate deeper information issues 
and challenges with underrepresented groups. The session will appeal to scholars, researchers and 
practitioners interested in development work, digital divides, digital inclusion, underrepresented 
communities, marginality and immigration studies. 

Dr. Gomez, a talented group facilitator, acting as a “talk show host”, will moderate the event. It 


will be organized in three parts (approximately 30 minutes each). 
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1.1 Workshop Part | 


Participating panelists will share specific questions/issues that they have faced when dealing with sensitive 
immigration and information related work (2-3 minutes each person). The goal of this exercise is to sensitize 
the audience to the kinds of theoretical, methodological, and ethical problems and complexities that emerge 
in information-related research and issues of marginality (Hughes, 1949). 

Questions that they can draw on can include but are not limited to:! 


e In what ways to my racial and cultural backgrounds influence how I experience the world, what I 
emphasize in my research, and how I evaluate and interpret others and their experiences? How do 
I know? 

e What is the historical landscape of my racial and cultural identity and heritage? How do I know? 

e What are the cultural and racial heritage and the historical landscape of the participants in the 
study? How do I know? 

e In what ways do my research participants’ racial and cultural backgrounds influence how they 
experience the world? How do I know? 

e How do I negotiate and balance my own interests and research agendas with those of my research 
participants, which may be inconsistent with or diverge from mine? How do I know? 

e What are and have been some social, political, historical, and contextual nuances and realities that 
have shaped my research participants’ racial and cultural ways or systems of knowing, both past 
and present? How consistent and inconsistent are these realities with mine? How do I know? 


Additional Questions: 


e Talk about a time in which you realized you were an outsider with regards to information (at the 
margins). 

e What did it mean to you at that time? 

e How did you feel? What did you do about it in the research context? 

e How did you resolve it? 

e How do you approach it now when faced with border crossing challenges in research? 

e Experiences with alternative methodologies to deal with insider/outsider navigation? 


As the topics are being discussed, key themes will be captured and displayed on a computer screen for the 
audience to see. This will be important for the next part. 


1.2 Workshop Part II 


The Moderator will invite all panel and audience participants to discuss one of the key topics related to 
Information and Immigration, per the preceding discussion. The points presented on the computer screen 
will serve as starting points for these discussions. The panel participants will serve as discussion facilitators 
during the small group discussions. 

Tentative topics include: 


e Sharing Data Collection Experiences 

e Ethics 

e Developing new methodologies along the line of Elfreda Chatman’s work (i.e., Chatman, 1992) 
e Pros and cons of Translocal Analysis (Cvetkovich & Kellner, 1997) 

e Local Development (Engaged Scholarship, Partnerships) 

e Breaking down walls and seeing blind spots 


1 (Questions1-6 based on Milner (2007)) 
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1.3 Workshop Part III - Plenary Discussion 


The Moderator will elicit some of the key issues discussed in each group, and facilitate a wrap-up discussion 
with the panel participants. The panel participants will serve as boundary spanners during wrap-up session, 
expanding and connecting ideas discussed in small groups to larger concepts. Notes will be taken in real- 
time on a Google Docs spreadsheet that all participants can access and can also be shared via Twitter for 
people outside of the panel to add information and interact with participants (Quan-Haase, 2013). 


2 Conclusion 


Breaking down walls is the perfect theme for this interactive and engaging session that fosters the power of 
reflection to navigate the complex space of Marginality and Information. The focus of our proposal is to 
create a dynamic, engaging and thought provoking session to reflect and develop skills as an iSchool 
community, particularly those working with underrepresented communities. The provocative nature of the 
topics of margins, transnational immigration, border crossing (physical and metaphorical), legality, fear and 
context in this space will certainly interest members of the iSchool community interested pushing the 
boundaries of scholarship as we take on storytelling modes to reflect on our confines and privilege as we 
navigate the delicate area to see what is behind our own research walls. 
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Abstract 

Over the past decade, the individuals and institutions that comprise the iSchool caucus have spent a fair 
amount of time attempting to understand their place within the larger academic landscape, which has 
often been plotted in disciplinary terms. We now stand at a different precipice in need of greater 
understanding and sensemaking: the globalization of the iSchool community. This interactive session is 
dedicated to probing, investigating and imagining our future as a global network of scholars. The spirit 
of the session will be playful and interactive, attempting to build on the existing diversity of the 
participants to question our assumptions, engage in dialogue about our commonalities and differences, 
and imagine new, cross-planet futures for the iSchool community. 
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1 Introduction 


Over the past decade, the individuals and institutions that comprise the iSchool caucus have spent a fair 
amount of time attempting to understand their place within the larger academic landscape, which has often 
been plotted in disciplinary terms. Journal articles detail these acts of sensemaking, as do various workshop 
and conference sessions across the years (e.g., Cronin, 2005; King, 2006; Chen, 2008; Cox and Larsen, 2008; 
Saracevic, 2008; Marco and Javier, 2009; Cox, 2012; Lopatovska et al., 2012; Wiggins and Sawyer, 2012; 
Wu et al., 2012; Bidyarthi, 2013; Wedgeworth, 2013). These efforts have called into being an increasingly 
recognizable and coherent community that strives “to identify, clarify, and speak to the major issues, 
challenges, and driving questions at the nexus of information, technology, and society.” 

We now stand at a different precipice in need of greater understanding and sensemaking: the 
globalization of the iSchool community. The current iSchool caucus, which continues to grow, spans places 
as far away from one another as Champaign-Urbana, Illinois in the United States; Adelaide, South Australia; 
and Kampala, Uganda. As noted at the most recent iConference (Bonnici et al. 2013), the globalization of 
the iSchool community is happening on top of older, unresolved debates about the very meaning of 
information and the role of information technologies in society. Our proposal takes as it starting point that 
the international expansion of iSchools represents a certain type of institutionalization, which presents a 
ready opportunity for critical reflection and inquiry. In addition, we see the introduction of new institutional 
partners and affiliated researchers as a fertile moment to explore new ways of researching, theorizing and 
understanding how people, information, and technology intermix -- an exploration that may also reveal 
lingering gaps and biases stemming from our community’s primarily North American origins. 

We propose an interactive session dedicated to probing, investigating and imagining our future as 
a global network of scholars. The spirit of the session will be playful and interactive, attempting to build 
on the existing diversity of the participants to question our assumptions, engage in dialogue about our 
commonalities and differences, and imagine new, cross-planet futures for the iSchool community. 


1 http://ischools.org/about/history/organization/ 
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2 Length 


The session we envision will be 90 minutes and will unfold according to the following agenda: 


15 minutes] Introductions 

15 minutes] Rapid Fire Sticky Note Exercise 
10 minutes] Discussion 

10 minutes] Flash Ideation Session 1 

10 minutes] Flash Ideation Session 2 

15 minutes] Curation Session 


15 minutes] Closing Discussion and Dissemination Charge 


3 Format 


We will begin by introducing ourselves to one another following a simple template that both “breaks the 
ice” and imbues the session with a playful, prototyping spirit. We will follow with a rapid-fire “sticky note” 
session meant to elicit a diversity of ideas from the participants related to the globalization of our 
community. Specifically, participants will suggest globalization-related topics, themes, and questions of their 
choosing on a note, affix the notes to a wall (or tabletop) alongside all of the notes from others, and 
collectively begin classifying the notes according to commonalities. We are currently exploring how we 
might digitize this exercise with an online whiteboard tool like Padlet or Lino, which may increase the 
possibilities for spirited participation. Our desire for this exercise is that it has the potential to raise issues 
that might have been overlooked in the iSchool globalization discourse to date, and that clustering common 
themes will reveal conceptual hot spots and research opportunities that could fuel follow-on activities such 
as future publications or iConference panels and workshops. 

This ideation/classification task will be followed by a brief discussion to reflect on the conceptual 
hot spots and their perceived meaning among the participants. 

Following these warm-up exercises, the participants will break into small groups to engage in a 
second ideation task, this time prompted by the organizers to fill in pre-printed strips of paper with phrases 


” 


such as “In 5 years, an iSchool will be a place _ or “In the future, information will 


”. Groups will rotate during Flash Ideation Session 2 to engage in the same task 
at a different table that has a set of different ideation prompts. During the curation session, individuals will 
curate and cluster the strips from all three tables to create presentable artifacts (i.e., affix a groups of strips 
together to fashion a statement or commentary of some kind). We will work with the local organizers to 
find an appropriate venue for presenting these creations for public view during the conference, such as 
during the poster session or in the room where coffee breaks will occur. We particularly value this 
opportunity to reflect our work back to the community because of the emergent and fertile nature the topic 
of globalization has within our community. We expect that some type of public viewing may act as type of 
gentle provocation to open up this area for further thought and discussion more broadly. 

With regard to other outcomes, we plan to document the experience with photographs and will 
work with the webmaster at ischools.org to create a blog post or photo reel for the website. Additionally, 
depending on the insights gleaned from the experience, the two organizers may write a piece for JASIST or 
a similar journal that expands on the themes that emerge from the session on the globalization of the iSchool 


community. 


4 Preferred number of participants 


In the hopes of attracting a diverse and lively crowd to this event, we will not cap participation in any way. 
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Abstract 

Usage of location-based applications and websites has recently exploded. These tools leverage users’ 
location information, social connections and mobility; and combine anonymity with voluntarily disclosed 
spatio-temporal location information to generate opportunities for geographic exploration and social 
interaction. The most popular of these systems designed to enable exploration of the physical 
environment, best exemplified by systems like Yelp or Foursquare. However, there has been both 
commercial and research efforts to create systems that also enable social exploration, that is finding 
potential new friends within a geo-located framework. Although this is a fairly complex problem domain, 
there have been several successful applications enabling social interaction, including Scruff, Grindr, and 
Mister. Interestingly, all of these apps target a single community: gay men; apps targeting a wider 
audience, such as Blendr or Tinder have been less successful. 

These apps are at an unique intersection of areas of great interest to researchers: mobile apps, 
location-based services, and social networking. However, they also involve areas of emerging work, 
particularly for the iConference community. The first of these are questions around the initiation of new 
social ties, ones initiated in the virtual rather than physical world. Second, these applications are often 
geared towards dating or even just sexual interaction, and in particular, for the gay/MSM community. 
Are these applications taking advantage of distinct aspects of the MSM community? Or can some of the 
design space be re-purposed for the larger heterosexual (and/or non-dating) community? 

Ideal participants in this session will have experience in studying or using these systems, and be 
interesting in furthering the research and design of these systems. The format of the session will be geared 
towards identifying key themes in proposed or on-going research; examining the literature to date; 
defining a research agenda and design space; and discussion of methods and ethics on how to do research, 
especially around sensitive matters like sexuality, dating. We hope to come out of this highly interactive 
session with a draft of a research space, an initial outline of the literature identifying core background 
ideas as well as gaps for research, and finally, the start of a community of researchers who would be 


interested in attending future events, as well as potential collaborators in this emerging area of research. 
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Abstract 

The Analogue Internet: A Post Intervention is an arts-based research project that reflects on the spirit 
of this digital information age. This alternative event activates a new space for meaning-making within 
the conference setting, inviting conference participants to engage in an alternative method of inquiry 
facilitated by an interactive art installation. 
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1 Introduction 


The Analogue Internet started as a mail-based art project in June 2012. Currently, it has seven installments, 
built from an ever-changing collection of found source materials, sent to an evolving list of international 
subscribers. The project was first inspired by an ecological impulse to salvage discarded books and 
deaccessioned library materials. Excerpts from these recovered resources were cut out, assembled, and 
packaged in an envelope of print and non-print-based miscellany. Each envelope, a legal size (41/8” x 9 4”) 
airmail envelope, was filled with a unique collection of curated ephemera, some pieces directly referencing 
the Internet and others broadly reflecting a rich Information landscape. Sources used for The Analogue 
Internet include do-it-yourself guides, dictionaries, translation books, world almanacs, cookbooks, workout 
books, puzzle books and maps; in addition to other information-rich curios such as seeds, string games, and 
friendship bracelets. Each mailing of the Analogue Internet is a hand-held, hand-made, and hand-delivered 
relief of the digital age, sent to over forty subscribers. 


Figure 1: The Analogue Figure 2: The Analogue Internet. Figure 3: The Analogue Internet. 
Internet. The contents of the The contents of the December The contents of the September 
July 2013 Edition. 2012 Edition. 2012 Edition. 


2 The Installation 


The Analogue Internet: A Post Intervention builds on the arts-based questioning posited by the Analogue 
Internet project, and creates an intervention specifically designed for the iConference, 2014. The alternative 
event will activate a new space of the conference through the installation of an Analogue Internet 
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intervention. The proposed Analogue Internet: A Post Intervention will exist as a space for the artist to 
make and distribute the Analogue Internet and for conference participants to send, read, and discuss the 
project. The alternative event plays with established conference practices- such as the receiving of conference 
pamphlets, networking, and information sharing - as each conference participants will have the opportunity 
to receive an edition of the Analogue Internet, sort through the content, discuss their findings with others, 
and even mail the Analogue Internet to a friend. The Analogue Internet Intervention will be a space for 
connecting, interacting, and information gathering: a mirror to our contemporary digital possibilities. 


3 Theoretical Framework 


Arts-based research is an alternative approach to qualitative research, bridging scholarly inquiry and 
creative processes (Rose, 2007; McNiff, 2008). Arts-based research is a method of inquiry that can employ 
a range of art practices. Art therefore becomes the catalyst for the exploration of questions and theories 
through expressive means of inquiry. Inspired Relational Aesthetics, The Analogue Internet: A Post 
Intervention values social encounters and participation as a method for meaning-making (Bourriaud, 2002). 


4 Relevance to the Conference. 


The Analogue Internet: A Post Intervention 
explores the potential of the arts-based 
research within the field of Information 
Studies, facilitating inquiry through 
situational intervention. Through the 
installation, conference participants will have 
an opportunity to gaze upon the diversity of 


a rich, print-based information environment, 
deconstruct the materials, and recontextualize Figure 4: The Analogue Internet. Unopened July 2013 
ephemera. The inclusion of non-print based Ẹdition. 

resources enables the collocation of things that 

at first seem separate from the information world but indeed are part of the landscape. The mail-based 
framing of the project demonstrates the overall connection of people through information: conference 
participants have the opportunity to send someone an edition of the Analogue Internet through the post, 
in addition to discussing their own edition within the context of the conference. The Analogue Internet’s 
use of decontextualized information, is anchored equally on the destruction of resources as it is hinged on 
the recycling and reimagining of Information, encapsulating the old and the new, the random and the 
organized, the connected and the individualized. 


5 The Space 


The installation of The Analogue Internet: A Post 
Intervention is active throughout the iConference. 
The installation is comprised of one |-shaped desk 
with a single chair, and an adjacent area of five 
chairs. The artist will be stationed at the desk over 
the course of the conference, distributing the 
Analogue Internet, curating new editions, and 


answering questions. The table will also function 
as a site of exhibition for all print and non-print- 


Figure 5: The Analogue Internet. Preparing the 
December 2012 Edition. based materials used in the mailings. In addition 


1221 


iConference 2014 Rebecca Noone 


to the source material, the artist will be equipped with a typewriter, a scale, and stamps. 

Adjacent to the desk, there is a seating area, where people are invited to sit and explore their own 
package of the Analogue Internet. The seating area is a multipurpose space designed to facilitate the 
discussion and exploration of the content within each mailing and the issues raised by arts-based inquiry as 
a whole. The work can be encountered individually, in pairs, or in a group. Participation can be 
characterized as viewing, making, reading, or sending. And can be engaged throughout the conference: once, 


often, or routinely. 


6 Conclusion 


The Analogue Internet: A Post Intervention intends to activate new spaces in the conference environment 


to create areas for playful meditations on the nature of information, preservation, and participation. 
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Abstract 

The session engages with an acute tension evident in scholarly communication: We are witnessing a great 
deal of innovation and experimentation in relation to the way research is performed and shared. The 
push towards, and need for, innovation and creativity in academic research is being emphasized to an 
ever increasing extent. A rich set of digital tools and transdisciplinary engagements have opened the door 
for research conducted and reported in increasingly hybridised, dynamic and interactive ways. At the 
same time, academic research is increasingly being evaluated by focusing on quantitative analyses based 
on publications; analyses which privilege established scholarly practices and publication venues. In the 
session, we are interested in exploring collectively on the one hand, the voice in and position from which 
we report on research and — indeed — conduct research. On the other hand, how do we use documents 
and artefacts to tell our stories? Digital media provide new affordances through a broader selection of 
modes of representation to present data, results and argumentation. The session is conducted as a 


‘conversation café’, where each café table focuses on one aspect of these opportunities. 
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1 Overview 


The session engages with an acute tension evident in scholarly communication. We are witnessing a great 
deal of innovation and experimentation in relation to the way research is performed and shared (e.g.: 
Borgman, 2007; Cyberinfrastructure Council, 2007). As a consequence of multimodal opportunities within 
the scholarly ecosystem (Sugimoto & Thelwall, 2013) in combination with institutional and national 
imperatives (e.g.: Ministry of Education and Research [Sweden], 2012; Research and Innovation Council of 
Finland, 2010), innovation and creativity in academic research is being emphasized to an ever increasing 
extent. A rich set of digital tools and transdisciplinary engagements have opened the door for research 
conducted, discussed and reported in increasingly hybridised, dynamic and interactive ways (Francke, 2008; 
Kjellberg, 2010). At the same time, academic research is in many countries being evaluated through 
quantitative analyses based on publications — analyses which privilege established scholarly practices and 
publication venues. Not only do the evaluation systems greatly limit the forms of expression that are valued 
for communicating research, but they also carry the risk of emphasizing a certain segment of publication 
channels available, and encouraging ‘safe’ or immediately recognizable research deemed to have the greatest 
potential of attracting citations from the research community (DORA, 2012). 

Research is an inherently creative practice which comes under pressure because of tensions 
associated with publishing imperatives and organisational challenges (Anderson, 2011). In the session, 
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participants will explore collectively two ways to encourage creativity in research. Firstly, how information 
studies researchers can broaden the researchers’ spectrum when it comes to conducting and telling the story 
of research by adopting innovative approaches from various theoretical and methodological perspectives 
and from the arts. Secondly, digital media provide new affordances through a broader selection of modes of 
representation to present data, results and argumentation, i.e. to tell stories of research. Examples include 
research datasets published in journals (e.g. http://openarchaeologydata.metajnl.com/), laboratory work 
presented as film to allow the laboratory conditions for an experiment to be replicated (e.g. 
http://www.jove.com/), and academic work portrayed in the aesthetics of documentary videos (e.g. 
http://www.audiovisualthinking.org/) or images (Hartel & Thomson, 2011). That research is performed 
differently in different disciplines has been known for a long time (e.g. Becher & Trowler, 2001; Knorr- 
Cetina, 1999; Whitley, 2000). Acceptance and promotion of using creative and innovative voices and modes 
of representation in telling the story of research are also clearly influenced by disciplinary needs, interests, 
and traditions. If, in information studies, we draw inspiration from other fields and develop our own 
approaches to conducting and reporting research creatively, we also need to discuss how this research will 
be fairly evaluated within our own field and in the broader research policy landscape. 

The session organisers, who offer a rich range of experience and insight about the contemporary 
research climate, share a common belief in the value innovative forms of scholarship have for enhancing our 
research impact and for our engagement with the very communities that we study and hope to support 
through our research. The session will provide a possibility to reflect upon and discuss the participants’ 


own research (publishing) practices, as well as those practices observed in the research community. 


2 Purpose and Intended Audience 


We aim to make the session interactive by drawing the audience deliberately into active engagement with 
the topics discussed. Therefore, we propose a ‘conversation café’ format to identify ways to move forward 
as a community and as individual scholars and professionals in relation to two broad areas of concern: 


e Can the push for innovation and creativity in doing/reporting research and expectations of 
indicator-based research evaluation be reconciled? 

e What are the possibilities and potential for broadening the voices and modes of representation we 
use in telling stories of our research through academic publishing? 


The proposed interactive session is targeted towards two groups: 


1. Participants contemplating creative techniques for storying their research. 
2. Participants interested in scholarly communication as a research or professional phenomenon. 


3 Relevance to the Conference/Significance to the Field 


This event speaks to conference themes by exploring ways to break down walls in relation to scholarly 
communication; culture, context and computing all figure in the story of research we wish to discuss with 
our audience. The session addresses the forms and role of publishing in our field, which can appeal to a 
wide audience ranging from experienced researchers and practitioners to research students. 


4 Format of the Activity 


The interactive session will follow a ‘conversation café’ format designed to encourage large group dialogue 
(Brown & Isaacs, 2005). To initiate the conversations, the organisers introduce some of the issues in very 
short opening statements from our different perspectives. Next, participants discuss the topics and move 
around café tables for three rounds of activity as follows: 
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1st iteration (focusing): The first group initiates by looking for ideas and opening up the discussion 
rather than searching for a solution. 

2nd iteration (deepening & connecting): Starting with a summary of the first group, the second group 
seeks to deepen thinking about the topic. 

3rd iteration (moving forward): Building on the work of the first two rounds, the third round is about 
finding ways forward. In particular we want to focus what can be done to advance the issue beyond the 


conference. 


The organisers will facilitate and document the process at each café table, using the following starter topics 
to seed discussion: 


> ‘Table 1 — Reinterpreting the voice and role of the researcher 

> Table 2 — Remediating the scholarly article in new modes of representation 

> Table 3 — Reporting research in new formats and implications for research evaluation 

> ‘Table 4 — Revisiting measures of research impact: What ‘counts’ as research? 

> ‘Table 5 — Recognising creativity: valuing creative techniques mindfully within the research process 


At the end of the final round, each table will present the outcomes of their table topic as part of a closing 
discussion about ways to carry momentum from this session into the future. What could happen next? 
Should there be more? What partnerships might enable this work to take place? Outcomes will be 
disseminated to participants in a format decided at the event. 
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Abstract 

Much of the current dialogue about personal data is anchored in fear, uncertainty and doubt. There is a 
growing sense ‘big brother is watching’ and that individual rights are being ignored, along with important 
values such as transparency. Recurring themes in the literature are trust, respect, freedom, informed 
consent, self-determinism, control, ownership, sensitivity and the right ‘to be left alone’. Individuals are 
also recognising data is an asset as organisations reap the benefits of linking disparate data to understand 
our preferences, tendencies and buying patterns. The growing conversation around privacy is largely the 
result of the technological capability that produces and harnesses data and its subsequent potential. At 
the same time, opinions about privacy issues are highly contextual. This event intends to stimulate 
thinking and activity around how information professionals can help shape the conversation and 
approaches to data, privacy and ethics. How do we address these issues in our organisations? Are there 
broader responsibilities to ensure educated citizens? We wish to bring together researchers and educators 
within the iSchool community interested in discussing the challenges associated with tackling privacy 
issues in data-intensive organizational context, using a participatory format to stimulate reflection and 
dialogue. The event builds towards a collaborative discussion of next steps of interest with a view to 


sharing outcomes and insights via an online community network. 
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1 Overview 


The US World Economic Forum’s 2012 Industry Agenda paper stated “dialogue about personal data is 
currently anchored in fear, uncertainty and doubt”, which is echoed in much of the literature around data 
privacy (e.g.: Best et al, 2006; Chen & Williams, 2007; Craig & Ludloff, 2011; Friedenwald et al, 2010; Haga 
& O’Daniel, 2011). There is a growing sense ‘big brother is watching’ and that individual rights are being 
ignored, along with important values such as transparency. Recurring themes in the literature are trust, 
respect, freedom, informed consent, self-determinism, control, ownership, sensitivity and the right ‘to be 
left alone’. Individuals are also recognising data is an asset as organisations reap the benefits of linking 
disparate data to understand our preferences, tendencies and buying patterns. The growing conversation 
around privacy is largely the result of the growth of information technology with enhanced capacity for 
surveillance and storage and the “increased value of information in decision-making” (Mason, 1986). It is 
the technological capability that produces and harnesses data and its subsequent potential that creates 
concerns today. At the same time, opinions about privacy issues are highly contextual: 


We both believe that Google Maps makes our lives easier, the real issue is: what level of privacy 
are we willing to give up for that convenience? ..privacy is never a simple discussion of right and 
wrong but a nuanced one that must balance opposing views to determine a course of action.” (Craig 
& Ludloff, 2011 p15) 
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We cannot assume everyone will make the same trade-off between level of privacy and access to data. 

There are many frameworks for organisations tackling privacy issues (Fleury-Lawson, 2010; 
Freeman & Peace, 2005; Mason, 1986; Stiles, 2012; Von Nyssen & Hotchkiss, 2013). The recommendations 
of the World Economic Forum provide a balanced and simple framework that any data-generating 
institution could pursue: 


1. Engage in a structured, robust dialogue to restore trust in the personal data ecosystem. 
2. Develop and agree to principles that encourage the trusted flow of personal data. 
3. Establish new models of governance for collective action. 


A necessary first step in this process is a clear map of the personal data ecosystem of all members of any 
community. As educational institutions, it is also important to understand and fulfil our responsibilities for 
educating students, staff and the general community about privacy issues. For example, as Campbell (2012) 
asks, what can we reasonably advise people to do? Do we have a role in ensuring our community is aware 
of the technical capabilities of tracking via mechanisms such as ‘Cookies’? What is our role in ensuring 
access to information via digital data literacy? 

As we seek to i) better understand our environments through collection, analysis and reuse of data, 
ii) develop strategies, policy and procedures based on analysis of this data, and iii) make decisions about 
resources and services to support individuals in our communities, privacy concerns and issues need to be 
tackled through open and transparent dialogue. This session is an attempt at such dialogue. 


2 Purpose and Intended Audience 


This event intends to stimulate thinking and activity around how information professionals can help shape 
the conversation and approaches to data, privacy and ethics. How do we address these issues in our 
organisations? Are there broader responsibilities to ensure educated citizens? 

We wish to bring together researchers and educators within the iSchool community interested in 
discussing the challenges associated with tackling privacy issues in data-intensive organizational contexts 
to engage with strategic challenges of: 


e educating our respective communities about their data privacy; and 
e addressing the privacy of student and staff data. 


We welcome anyone with responsibility in this area willing to actively participate in and contribute to the 
dialogue. 


3 Proposed Activities 


A participatory format will be used to stimulate reflection and dialogue: 


Stage 1: Scene setting presentation (10 Minutes). Organizers share outcomes of conversations and 
research from their institution’s efforts to proactively tackle these issues; offer a general overview 
of privacy laws internationally and benefits of research data sharing. 

Stage 2: Data sharing (10 minutes). Participants work in pairs to share with one another responses 
to a series of questions about what data they collect, what data they think is collected about them, 
what does and does not concern them about what is collected, what price they are willing to pay 
for data. 

Stage 3: | Hypothetical play (25 Minutes). Proactive strategies about privacy and ethics must be 
future-oriented, so these hypothetical scenarios (based on a speculative fabulation technique from 
Anderson & Bawa) bring the unthinkable into representation to explore extremes of good intent, 
evil intent and the spectrum in-between. 
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Stage 4: Round table reflections (25 minutes). Participants work in small groups to articulate trends, 
issues and concerns arising out of responses to previous states to identify what we can do to remain 
on top of these issues. 

Stage 5: Wrap up and Next steps (20 minutes). Collectively discuss next steps of interest. We 
anticipate creating an online community network and sharing outcomes and insights via an online 
resource, such as wiki or blog. 


4 Relevance to the Conference/Significance to the Field 


Robert Mason opened his 1986 article on this topic with a call to action: 


The question before us now is whether the kind of society being created is the one we want. It is a 
question that should especially concern those of us in the MIS community for we are in the forefront 
of creating this new society. 


Nearly 30 years on, his concerns have even greater resonance — and relevance — for the iSchool community, 
whose own evolution is associated with explosive growth in digital information long surpassing what Mason 
described. As educators and advocates focussed on "understanding the role of information in human 
endeavors” (http://ischools.org/about/history/motivation/), data privacy and ethics are core concerns. The 
program format is deliberately designed to provoke thinking and contribute to meaningful dialogue needed 
within our organisations and our field. 

Some might argue the ability to access personal data has already eliminated all walls. However, 
such access exists only for those in positions of power. How do we shift the balance to ensure that adequate 
walls exist to provide the right to choose what is known about one's private life and enable citizens to be 


adequately data literate to pursue their own interests? 
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Abstract 

The session aims to bring together a group of researchers and educators within the iSchool community 
interested in implementing entrepreneurial thinking in curriculum (teaching and research). 
Entrepreneurship is a contemporary social and cultural movement extending beyond its starting point as 
a management discipline closely related to start-ups to gain a much broader meaning including social 
and cultural entrepreneurship. Today, entrepreneurship can be considered to be a part of a modern 
educational/ “bildung” ideal with the purpose to make pupils and students ready to cope with the 
challenges of modern life. Efforts are made to nurture the entrepreneurial literacies of students. The event 
is aimed at all those who have an interest in entrepreneurship, experimental teaching and in gaining 
experience in the use of alternative teaching methods, and interested in combing teaching and research 
activities. A participatory format will be used to organise the event. The goal is to produce a document 
that collects the activities and discussion and to also initiate an online community that can provide a 
basis for further work. 
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1 Overview 


Entrepreneurship is a contemporary social and cultural movement extending beyond its starting point as a 
management discipline closely related to start-ups to gain a much broader meaning including social and 
cultural entrepreneurship or as “a skillful way of being” (Spinosa et al., 1997). “Entrepreneurship is 
beginning to be recognized as the greatest source of productivity in both Western and many nonwestern 
cultures” (ibid, p34). Today, entrepreneurship can be considered to be a part of a modern educational / 
“bildung” ideal with the purpose to make pupils and students ready to cope with the challenges of modern 
life. Efforts are made to nurture the entrepreneurial literacies of students. For example, the OECD proposes 
to foster an entrepreneurial spirit and culture (OECD, 2010) and the European Commission suggests 
focusing school curricula on creativity, innovation, and entrepreneurship (European Commission, 2010). 
Adams (2006, p 43) suggests that teaching entrepreneurship can build students’ self-awareness about their 
own capacities and talents. As part of a lengthy examination of the sources of innovation and creativity in 
society, Adams studied the characteristics of successful entrepreneurs and observed that while they often 
have unexceptional backgrounds and academic records, the one thing many entrepreneurs have in common 
is a desire for experimentation and trial and error as their preferred learning style. There is also an 
observable inclination for and tolerance of uncertainty, ambiguity and risk. For this reason, the most 
effective teaching and learning design for such programs employs innovative and experimental methods that 
support experiential learning (ibid, p41-4). 

For the teacher interested in developing such literacies through their curriculum, this prompts 
questions such as: What tools can enable students to shape their own lives and destiny; together with other 
people create their own being (including their own job) based on acquired academic and professional skills? 
How can we encourage students to be engaged and responsible citizens? How to develop the students’ 
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knowledge and ambition to take action/ control; to create businesses and jobs, to increase creativity and 
innovation in existing or new organizations? The proposed session engages specifically with the challenges 
of teaching entrepreneurial thinking. Organisers of this proposed session, who have experience with 
experimental teaching, the use of alternative teaching methods and research-based teaching in relation to 
entrepreneurship teaching, will initiate a community discussion around the following questions: 


e In what way can entrepreneurship contribute to LIS + KM research, education and practice? 
e How can LIS + KM research and teaching contribute to entrepreneurship? 

e Should we, and if so, how can entrepreneurship be implemented in LIS education? 

e How can we implement entrepreneurship in LIS research? 

e What teaching tools and strategies are appropriate? What has worked and why? 


2 Purpose and Intended Audience 


The session aims to bring together a group of researchers and educators within the iSchool community 
interested in implementing entrepreneurial thinking in curriculum (teaching and research). The overall 
objectives are to: 


e initiate and build an academic network based on the participant group around entrepreneurship 
thinking; and 

e inspire participants to use entrepreneurship and innovation means/ methods/ techniques in relation 
to teaching activities. 


The event is aimed at all those who have an interest in entrepreneurship, experimental teaching and in 
gaining experience in the use of alternative teaching methods, and interested in combing teaching and 
research activities. 


3 Proposed activities 


A participatory format will be used to organise the event. The goal is to produce a document that collects 
the activities and discussion that can provide a basis for further work. Preparing a short statement 
collectively could be a unifying basis for community building and giving shape to a joint identity around 
this topic. 

After initial introductions to the event, the session is organised in three parts: 


e “lightning talks” round (15 Minutes) in which three examples from current practice will be shared; 

e "idea generation” round (15 Minutes) in which participants will be invited to reflect on the lighting 
talks on their own, in pairs and finally in small groups as part of an exercise designed to build a 
collective list of techniques of interest to the audience; 

e technique sharing” round (45 minutes), in which participants will move around a series of tables 
(that will take shape based on the outcome of the idea generation round) in short, rotating segments. 


The final 15 minutes will be used to wrap up the discussion and discuss next steps of interest to the 
participants. At this stage it is anticipated that an online community network would be established and 
that techniques and insights shared at the event would be disseminated via an online resource, such as a 
wiki or blog. 


4 Relevance to the Conference/Significance to the Field 


Seen from the perspective of the conference theme “Breaking Down Walls” entrepreneurship is especially 
interesting because it has the capacity to draw together people from different disciplines and encourage 
close collaboration on a joint / mutual multidisciplinary entrepreneurial project. Given that entrepreneurship 
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has become an important part of modern society and increasingly gained ground in education and university 
(see for example: Adams, 2006), it seems a natural thing to systematically implement entrepreneurial 
thinking in information studies. 

Information and knowledge handling/ management/ processing contributes to the creation of many 
new businesses and jobs. It is a core business component of most industries, such as in the many new 
Internet-based companies extending the significance of information and communication industries. In order 
for our students to be able to put their information studies into action, however, the dynamics of the future- 
oriented and agile work environments they will enter makes entrepreneurial literacy very relevant to their 
professional success (e.g. in the form of effectuation as discussed in Sarasvathy, 2008). Therefore, we have 
an obligation to implement entrepreneurial thinking within LIS and KM teaching and research to promote 
the process and for the benefit of our students. We hope that this event will motivate and inspire the 
participants to take up the challenge. 
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Abstract 

As organizations produce ever-larger torrents of text, images, sounds and numbers we find an increased 
attention to how researchers and organizational members alike can gain insights from these traces of 
social practices. Those issues have gained attention in the popular press and funding agencies alike. But, 
how do we best study traces of social practices left behind by organizational members? This fishbowl 
session aims to bring together researchers from different disciplines (such as HCI, CSCW, Organizational 
Studies, Information Systems, Library & Information Sciences, etc.) to brainstorm about the different 
approaches they have used (or, are planning to use) in studying trace data and documents, and to become 
aware of different types of methodological approached to trace data that are pursued in other research 
communities. The session will be followed by short focused interviews with selected participants that 


summarize important themes from the session, which will subsequently be made accessible online. 
Keywords: methodology, trace data, documents, ethnography 

Citation: Osterlund, C., Sawyer, S., Ribes, D., Shankar, K., & Geiger, S. (2014). What to Do with All those Traces People Leave 
Behind: Computing, Culture, and (Bits of) Context? In iConference 2014 Proceedings (p. 1234-1237). doi:10.9776/14250 
Copyright: Copyright is held by the authors. 

Research Data: In case you want to publish research data please contact the editor. 


Contact: costerluQsyr.edu, ssawyerQsyr.edu, dr273Q@georgetown.edu, kalpana.shankarQucd.ie, sgeiger@gmail.com 


1 What to do with all those traces people leave behind? 


As organizations produce ever-larger torrents of text, images, sounds and numbers we find an increased 
attention to how researchers and organizational members alike can gain insights from these traces of social 
practices. Those issues have gained attention in the popular press and funding agencies alike. One finds a 
steady stream of articles discussing how a data deluge swamps not only the big sciences such as astronomy, 
biology, medicine, and physics, but also the social sciences and humanities (Holtz, 2009). Large organizations 
are also grappling with the burden, opportunity and responsibilities of large data sets. The military, for 
instance, is awash in data from drones (Drew, 2010). 

The rapid growth in data opportunities (and issues) have been on the radar of the funding agencies 
for some time — and of late include grant opportunities for the social sciences and humanities (e.g., NEH, 
Digging for Data). In the private sector many consultant groups and firms (e.g., IDC, Gartner, Fios, 
Attenex) now specialize in the scanning, indexing, and mining of documents. Likewise, we hear calls across 
many intellectual communities for a greater emphasis on data mining and the maintenance and sharing of 
large document repositories as new data options are reshaping scholarly work. 

Studying trace data (whether in the form of text, images, sounds, numbers, etc.,) allow scholars to 
position organizational members’ immediate activities and situated routines in their larger social and 
organizational context (Mayernik, Wallis, & Borgman, 2012; Ribes & Lee, 2010; Smith, 2005). As documents 
carry institutional structures and point to both past and future activities they open a window to larger 
organizational practices (Smith, 2005; Boellstorff, Nardi, Pearce, & Taylor, 2012; Hine, 2007; Jirotka, 2005; 
Sawyer, Kaziunas, & @esterlund, 2012). Furthermore, researchers can often access traces of social practices 
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in large document repositories, opening a window to patterns of coordination and knowledge work that goes 
well beyond immediate observations (Geiger & Ribes, 2011; @sterlund, Sawyer, & Kazianus, 2010; 
Østerlund, 2008; Shankar, 2006) 

But, how do we best study traces of social practices left behind by organizational members? The 
answer may seem tantalizingly straightforward. You gather a pile of what organizational members drop left 
and right and start digging through it. But if you step back and begin looking through your qualitative 
method books you will realize that documents, artifacts and other traces tend to serve as a lower caste in 
field research. Most chapters and articles will help the reader refine their interview and participant 
observation skills. Trace data are often addressed in passing under headlines such as “secondary sources” 
or “unobtrusive techniques,” if at all. Consequently, qualitative researchers develop strong skills in 
producing rich descriptions of the context in which some usually unspecified technology is seen to operate. 
Many researchers appear to treat traces of social practices as they approach interview transcripts and field 
notes — with little regard to how they may hold a unique position in organizational infrastructures and work 
practices. 


2 [Intended Audience & Proposed Activities 


This fishbowl session aims to bring together researchers from different disciplines (such as HCI, CSCW, 
Organizational Studies, Information Systems, Library & Information Sciences, etc.) to elevate the discourse 
regarding different approaches they have used (or, are planning to use) in studying trace data, and to 
become aware of different types of methodological approached to trace data that are pursued in other 
research communities. The session will be followed by short focused interviews with selected participants 
that summarize important themes from the session!. These will be edited into a short podcast and made 
accessible online. 


3 Roles and Topic Description 


Steve Sawyer, Syracuse University, will act as the moderator in support of the following fishbowl 
initiators: 

David Ribes, Georgetown University 

Historical Ethnography of Sociotechnical Systems 

Increasingly, organizations are making more and more documentary, trace and other archival data available 
online -- often reaching back into their own archives to conduct systematic digitization and indexing 
endeavors. Historical ethnography draws together the ethnographic sensibility for lived experience, 
members' meanings, and practice, with the documentary methods of archival research. A historical 
ethnographic approach to sociotechnical systems will allows us to: 


e Track longitudinal trajectories of technological change, rather than single moments of innovation 
and adoption. 

e Recover the novelty, surprise, or 'sexiness' of technologies at each moment: while we may have 
become accustomed to email, instant messaging, and relational databases, they were at one point 
inspiring, disruptive or to be ignored as a fad. 

e Track the uneven circulation of innovations: a technology that has been normalized, or even 
considered outdated in some contexts, may be revelatory at other sites of adoption 


1 E.g., see the interview with Christine Hine at http://www.youtube.com/watch?v=sHvEzvqA0VI&noredirect=1. 
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Kalpana Shankar, University College Dublin 

Reflections on Trace Data in the Study of Social Science Data Archives 

Social Science Data Archives (SSDAs) comprise some of the earliest and most successful efforts to curate 
research data, but are seldom discussed as exemplars in the contemporary discussions on digital 
curation. We will report on how we have been using "trace data" (organizational documents, notes, and 
related texts) in our ongoing comparative studies of several long-standing and established SSDAs to surface 
ideas relevant to today's concerns about data. For this session, we will focus on the advantages and 


disadvantages of using such data to yield insights into organizational practices over time. 


Stuart Geiger, UC Berkeley 

Trace-Ethnography 

Geiger argues that good 'quantitative' trace data analysis (or even 'Big Data! in general) is actually *harder* 
than many other methods, because it rests on an often-unacknowledged qualitative/ethnographic 
understanding of how that trace data is generated and what it means in a specific socio-technical context. 


Carsten Østerlund, Syracuse University 

Documenting work 

@sterlund discusses the benefits and challenges of qualitative research focusing on people’s unfolding 
documenting work. He presents a methodological research strategy integrating the gathering and analysis 


of the online and location specific documents littering our work environments. 


4 Relevance to the Conference/Significance to the Field 


This fishbowl session is directly relevant to the cross-cutting theme for the iConference’ 2014 (Breaking 
down Walls | Culture, Context, Computing) as it focuses attention on how scholars conduct computer- 
supported analysis of trace data and still maintain a rich contextual and cultural grounded methodological 
approach to our data. Doing so will require us to break down walls of existing methodological traditions. 
To that end, the fishbowl format will promote an interactive discussion by bringing together researchers 
from different disciplines, and bringing forth awareness concerning different approaches that are available 
for studying traces of social practices in context. 
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Abstract 

This panel brings together several presentations on the topic of information failure, particularly through 
tropes of noise, misinformation, error, and breakdown. The four speakers will follow a “Pecha-Kucha” 
style of presentation (thirty slides at twenty seconds a slide) followed by group discussion. Leah Lievrouw 
will consider noise by linking recent discussions of big data with information systems theorists 
Bertalanffy, Shannon, and von Foerster. Colin Doty will discuss misinformation within recent debates 
over vaccine safety. Patrick Keilty will provide a textual analysis of Desk Set (1957) to demonstrate the 
way that error is gendered female in representations of technology. Lastly, Lilly Nguyen will provide a 
semiotic analysis of technological breakdown, drawing from ethnographic fieldwork of software in 


Vietnam. 
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This panel brings together several presentations on the topic of failure in information and technology. 
Failure represents the shadows of the information studies field, typically seen as a negative and undesirable 
quality to be eliminated entirely. Across these four presentations, this panel demonstrates the sticky 
persistence of failure in spite of narratives of efficacy and transparency that typically accompany discourses 
of information and technology. In turn, this panel provides new discussion of an overlooked topic in the 
study of information and technology. These four presentations represent a broad range of approaches to 
failure, particularly through tropes of noise, misinformation, error, and breakdown. Following the “Pecha- 
Kucha” style of presentation, authors will present their talks through a series of slides at rapid pace: thirty 
slides, at twenty seconds for each slide. This presentation format is highly visual, informal, and conducive 
to group discussions and interaction. The panel will conclude with thirty-minute interactive discussion. 

In the first presentation, Leah Lievrouw places a historical and theoretical account of big data to 
consider error and breakdown through tropes of noise. Drawing from information and systems theorists like 
Ludwig von Bertalanffy, Claude Shannon, and Heinz von Foerster, she asks do big data foster emergence 
and non-linearity, or seek to eliminate them? Thinkers like Bertalanffy, Shannon, and von Foerster rejected 
reductionist explanations of natural and social phenomena in favor of complex, holistic views. Instead of 
simple cause-effects models of change, they proposed non-linear accounts in which change emerges 
continuously and unpredictably from the many, complexly interrelated elements of a system and its 
environment -- a principle that von Foerster called “order from chaos.” Contemporary big data capture, 
storage, and analytics would seem to provide the ideal opportunity to observe the emergent, non-linear, 
unpredictable changes of state hypothesized by systems thinkers. Big data advocates insist that data volume 
and new analytics both capture and make an ever-greater proportion of data available and comprehensible. 
That is, big data techniques convert huge swathes of previously unusable data into information, effectively 
reduce or eliminate noise, and vastly extend the possibilities for prediction. Yet some experts see the term 


iConference 2014 Lilly U. Nguyen et al. 


as over-hyped and misunderstood. Skeptics interpret the same developments as simply shifting the boundary 
between the intelligible and meaningful, on one hand, and an ever-expanding domain of randomness and 
noise, on the other. As one blogger recently put it, “big data is only big when the amount or complexity 
takes you out of your comfort zone.” 

Colin Doty explores notions of failure in misinformation, particularly trough evidence evaluation in 
beliefs about vaccine safety. The current anti-vaccination movement has placed doubt and fear onto 
vaccines. Anti-vaccine activists insist on the hazards of child vaccination. Such ideas have become 
increasingly popular thus challenging once long-held medical practices. Through the case of misinformation 
in child vaccination, Doty asks at what point does error and breakdown actually occur? On the one hand, 
misinformation arises from error in evidence evaluation by those who believe and consume the information. 
On the other hand, misinformation is also caused by unvetted amateurs who produce information without 
editorial oversight. Both approaches may oversimplify the problem. Where exactly are the errors and 
breakdowns that cause misinformation? How do we identify what is misinformation and what is not? How 
do the processes of creating information and misinformation differ from each other? And how are they 
similar? Likewise, how do the processes of evaluating information and misinformation differ from each 
other, and how are they similar? 

Patrick Keilty provides a textual analysis of Desk Set (1957) to demonstrate the way that error is 
gendered female in representations of technology, particularly during the computational boom after World 
War II. In the film, Bunny Watson, played by Katharine Hepburn, is a reference librarian at a large 
corporation. Spencer Tracy plays Richard Sumner, an early computer scientist who has been hired to 
introduce a computer, “EMERAC,” an allusion to IBM’s ENIAC, into the all-women reference library. 
Throughout the troublingly sexist narrative, Richard repeatedly insists that EMERAC can only make a 
mistake “if the human element makes a mistake first.” For Richard, EMERAC is a flawless system for 
retrieving knowledge, while the “human element,” nearly always gendered female, functions as the 
unstable variable of knowledge retrieval, in need of computational improvement. Thus, the film creates a 
dichotomy in which the method of the computer — privileged by the corporation’s financially conscious, 
all-male executives—displaces the method of the erroneous (and female) human. Complicating this 
concept of error, the film creates parallels between women’s information and administrative labor and 
EMERAC’s efficiency. As Mary Flanagan has it, Bunny becomes a metaphoric “bride” in the end and 
defeats the efficient machines of her bachelor suitor. Bunny’s methodological and meticulous, almost 
machine-like command of knowledge, by the end of the movie, allows her to beat the very machines sent 
to replace her., Bunny beats the machine in an uncanny way, saving the day with her genuine human 
knowledge, her way of connecting events and facts in a sensible order. In the end, the machine spins out of 
control, while Bunny remains cool and knowledgeable, displacing the concept of error from human to 
machine. Yet, to the extent that EMERAC is personified as a temperamental “girl,” the concept of error 
remains gendered female. 

Lilly Nguyen’s presentation will provide a semiotic analysis of technological breakdown to explore 
the cultural and political implications of failure in technology-driven economic development. Drawing from 
ethnographic fieldwork in Vietnam, her talk will describe the start-up organizations and entrepreneurial 
communities there. Many members of these communities were foreigners and members of the Vietnamese 
diaspora who came to Vietnam with the intent of helping to “develop” the country. These people saw 
potential among the young workers but also saw deficiencies and failures, leading one entrepreneur to 
describe the country as “ghetto” during an informal conversation. This kind of complaining was regular 
banter for this community. Moreover, this “ghetto” quality ascribed to Vietnam pointed to a specific feature 
of modern life there: persistent breakdown. The talk will start with a technical view of breakdown — of 
machine failure, of disrupted network connections—to further describe the ways that such technical 
breakdown is then extrapolated to signify racial inferiority, political deviance, and modern deficiency. 
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Abstract 

This paper describes a session for interaction and engagement to be held at iConference2014. The session 
for interaction and engagement focuses on researchers at iSchools and as such is an intellectual follow-up 
to the systematic check of all iConference2014 paper submissions in a copying detection system. The 
session offers a platform for discussing whether the use of such a system is justified for a conference that 
attracts submissions from highly respected researchers. Panel members and the audience will discuss the 
amount of text a researcher is allowed to reuse and when a submission would no longer be considered to 
be original and starts to be considered self-plagiarism. Parts of the discussion will center on the question 
of whether information science researchers can actually avoid repeating the same words when today they 


have to publish results from research projects in as many publications as possible. 
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1 Session proposal 


Publish or perish is the boon and bane of editors and conference chairs. It is a boon, because it means 
editors and chairs receive more submissions each year. It is a bane, because the pressure that authors face 
today might lead to copying from others or to reusing one’s own text. 

During the last year, the journal Library Hi Tech, Emerald Group Publishing, received a record 
number of submission, with one submitted every other day. Of these, 24% had to be rejected because they 
contained a significant portion of copying. Most articles with large copied passages in Library Hi Tech came 
from developing countries, where using the words of others is considered to be a form of homage that 
recognizes the expertise and authority of earlier authors. 

It would be too simplistic to interpret this copying as just a problem in these countries. In Germany, 
for example, more than 50 doctoral theses of, among others, scholars and politicians, have been documented 
in VroniPlag Wiki (http://de.vroniplag.wikia.com/wiki/Home) as having extensive plagiarism. Since 2011, 
three politicians, the former vice president of the European Parliament, the Minister of Defense, and the 
Minister of Education stepped down from their positions in the wake of their dissertations being revoked. 
A core player in uncovering these plagiarisms was an initiative of pseudonymous persons who have been 
documenting plagiarism in dissertations and habilitations. The session will discuss the activities of these 


scientists and explore how they define and detect plagiarism. 
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This session for interaction and engagement is not primarily about authors from developing countries or 
German politicians who needed a doctoral title for their reputation. It is about researchers at iSchools and 
as such is an intellectual follow-up to the systematic check of all iConference2014 paper submissions in a 
copying detection system. The session offers a platform for discussing whether the use of such a system is 
justified for a conference that attracts submissions from highly respected researchers. 

This year, the iConference received 113 full papers and 74 notes submitted. The acceptance rate 
was 35% and 49% of the submissions. Two submissions were rejected because identical research had been 
published before; two other submissions were rejected because the authors had already published several 
articles on the same subject and the submissions’ content did not contain enough new research to warrant 
publication. 

The competition between researchers in information science is high. Only the most productive 
researchers will be awarded academic or research positions. Results of research projects are often published 
in as many articles as possible in order to increase a researcher’s output. It has started to be a common 
practice to publish one article about preliminary results, one article on survey results, one article on follow- 
up focus groups and a last article summarizing all results. This “salami-tactic” or “least publishable unit” 
approach increases authors’ publication lists, but is a nightmare for readers. 

It also creates a challenge to both editors and authors. While the results of these articles are different 
from article to article, there are only few ways to explain how one collected, for example, data with a survey. 
Since the background also is the same for all studies, it becomes hard for authors to write an appropriate 
method and background section without copying from previously published articles. 

The session discusses the amount of text researchers are allowed to reuse and when a submission is 
no longer considered to be original and starts to be considered self-plagiarism. Parts of the discussion will 
center on the question of whether information science researchers can actually avoid repeating the same 
words when today they have to publish results from research projects in as many publications as possible. 


2 Panel Members 


Initiating discussion points will be made by the following panel members: 


- Prof. Dietmar Wolfram, in his function as Paper Chair of iConference2014 

- Ass. Prof. Elke Greifeneder, in her function as Co-Editor of Library Hi Tech and Program Chair of 
iConference2014 

- Dr. Sven Fund, in his function as Publishing Director of DeGruyter 

- Prof. Debora Weber-Wulff, in her function as German plagiarism researcher and VroniPlag Wiki 
participant 

- Prof. Michael Seadle, in his function as Head of the Commission on Research Ethics at the 
Humboldt-Universitat zu Berlin 

- Dr. Lynn Silipigni Connaway, in her function as an author 

- Prof. Tingting Jiang, in her function as an author 
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